计算机与现代化

• 网络与通信 • 上一篇    下一篇

基于Hadoop的访问热点副本迁移技术

  

  1. 河海大学计算机与信息学院,江苏南京211100
  • 收稿日期:2015-08-25 出版日期:2016-01-22 发布日期:2016-01-26
  • 作者简介:冯钧(1969-),女,浙江常州人,河海大学计算机与信息学院教授,博士,研究方向:时空数据管理,智能数据处理与数据挖掘,水利信息化; 王纯(1990-),女,浙江盐城人,硕士研究生,研究方向:信息检索; 朱康康(1990-),男,硕士研究生,研究方向:时空数据管理; 魏童童(1990-),女,江苏盐城人,硕士研究生,研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61370091, 61170200)

Replication Migration Technology for Access Hotspots Based on Hadoop

  1. College of Computer and Information, Hohai University, Nanjing 211100, China
  • Received:2015-08-25 Online:2016-01-22 Published:2016-01-26

摘要: 提出一种云环境下的访问热点负载均衡模型:基于节点的吞吐量与响应时间等主要参考指标,构建节点负载判定模块;文件在HDFS存储的过程中,将文件对应的数据块编号与存储路径相结合,设计存放在数据节点中的数据块到文件目录映射表;提出一种基于节点负载以及节点的存储空间的迁移源节点和目标节点选择方法;基于机架感知的机制,制定一种动态副本迁移方案。最后利用执行器下发指令给相应的数据节点,执行具体的迁移任务以及完善迁移后副本因子等参数信息的调整。通过迅速扩散副本的方式,来增加热点文件的副本数量,使得系统能够对外提供更大的吞吐量,缩短系统反应时间。

关键词: 云存储, 大数据, 数据迁移, 动态副本, 访问热点

Abstract: This paper proposes an access hotspots load balance model of cloud environment. Based on the node throughput and response time of the main reference index, this paper constructs the node load determination module. In the process of files storing to HDFS, combining the file corresponding to the data block number with storage paths, the blocks of data stored in the data node are designed to a file directory mapping table. A selection method based on the node load and the storage space of migration of source node and destination node is puts forward. Based on the frame of perception mechanism, a dynamic copy migration scheme is established. Finally using the actuator issued instructions to the appropriate data nodes, it carries on the specific migration tasks and improves the migrated copy factor to adjust parameters, such as information. This paper, by means of copy quickly spread to increase hot file copy number, enables the system to provide greater throughput and shortens the system response time.

Key words: cloud storage, big data, data migration, dynamic duplication, access hotspots

中图分类号: