计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于预测的云计算热点数据副本因子决策算法

  

  1. 1.南京航空航天大学计算机科学与技术学院,江苏南京210016;2.中国人民解放军94860部队,江苏南京210018
  • 收稿日期:2014-11-12 出版日期:2015-02-28 发布日期:2015-03-06
  • 作者简介:张松(1989),男,江苏连云港人,南京航空航天大学计算机科学与技术学院硕士研究生,研究方向:计算机网络与分布式计算; 杜庆伟(1974),男,副教授,博士,研究方向:计算 网络与分布式计算; 孙静(1981),女,中国人民解放军94860部队助理工程师,硕士,研究方向:计算机网络。
  • 基金资助:
    国家自然科学基金资助项目(61202350)

Dynamic Replicas Strategy Based on Predicted Popularity

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
    2. Unit 94860 of PLA, Nanjing 210018, China
  • Received:2014-11-12 Online:2015-02-28 Published:2015-03-06

摘要:

为了提高数据的可用性和集群的整体性能,目前的HDFS(Hadoop Distributed File System)采用了副本数目固定的副本放置技术,然而由于文件热度存在较大差异,对那些具有较高热度文件的访
问将影响作业的执行。为克服上述问题,本文提出一种基于预测的热点数据副本因子决策算法。根据数据的最近访问特征,基于灰色预测技术,采用马尔科夫预测模型修正因数据波动和突发访问造成的
预测偏差,获取文件的未来访问热度,并基于预测值建立有限通道服务模型,寻找满足用户需求的最小副本因子。实验表明,较之现有的副本管理策略和基于实时热度调整副本因子策略,本策略可以有
效减少热点数据的访问冲突,减少热点数据作业的执行时间和网络负载。

关键词: 热点数据, 副本管理, 云计算, Hadoop, 灰色预测, 生灭过程

Abstract:

To improve data availability and performance of cluster, current HDFS adapt uniform data replication. However, different files have different popularity and sometimes
the disparity is enormous, access to high popular data may hurt job performance. To address this problem, a dynamic replicas strategy based on predicted popularity is put
forward. By making full use of the recent data popularity, based on grey prediction model, we use Markov prediction model to correct the predicted deviation because of the burst
access and shifting access, and get the accurate predicted popularity of file. After then, finite channel service model based on the predicted popularity is established to
calculate the minimum replicas meeting user demand. Experimental result shows that compared with default data replication, our strategy can more effectively avoid contentions,
reduce the time consuming of job, and alleviated the network traffic.

Key words: high popular data, replica management, cloud computing, Hadoop, grey prediction, birth and death process

中图分类号: