计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

Storm流处理平台中负载均衡机制的实现

  

  1. (中国电子科技集团公司第三十二研究所信息服务平台室,上海201808)
  • 收稿日期:2017-04-14 出版日期:2017-12-25 发布日期:2017-12-26
  • 作者简介:张楠(1992-),女,河南兰考人,中国电子科技集团公司第三十二研究所信息服务平台室硕士研究生,研究方向:云计算与数据挖掘; 柴小丽,女,研究员级高级工程师,副总工程师,硕士生导师,研究方向:计算机系统结构,嵌入式计算机,国产化计算机; 谢彬,男,博士; 唐鹏,男,博士。
  • 基金资助:
    中国电子科技集团公司第三十二研究所自立项目(ZQ160006;ZQ160007)

Realization of Load Balancing Mechanism in Storm Streaming Processing Platform

  1. (Information Service Laboratory, The Thirty-second Institute of Chinese Electronic Technology Group Corporation, Shanghai 201808, China)
  • Received:2017-04-14 Online:2017-12-25 Published:2017-12-26

摘要: Storm流处理平台解决了传统的基于Hadoop的批处理系统实时性不高的问题,为多源异构大数据处理提供了高效、快速、实时的数据处理框架。然而Storm平台在任务分配过程中只考虑了不同节点之间可用Slot的排序,并没有充分考虑节点的实际负载情况,从而容易产生负载不均衡的问题。针对以上问题,本文在Storm分布式流处理系统上实现对可用Slot和节点负载情况的加权排序改进Storm调度算法,通过数据结构设计,保证rowkey的随机性和唯一性,确保RegionServer的负载平衡;同时通过批量写入的机制,提高Hbase数写入速度,从而提高流数据存储效率。通过与原生Storm系统的对比实验,表明本文算法的改进和机制优化保证了数据的快速写入,提高了集群资源的利用率,改进后的系统在实用性与效率上具有明显的优势。

关键词: Storm, 流处理, 分布式计算, 批量处理, 负载均衡

Abstract: Compared with Hadoop, Storm has advantage of real-time data stream processing, which provides an efficient, fast and real-time data processing framework for multi-source heterogeneous data processing. However, the worker assignments in the Storm cluster only consider the sort of available Slot between different nodes, while ignoring the current load condition of different nodes, which may fail to meet the command of load balancing when more than one topology running in the cluster. In order to improve the efficiency and achieve load balancing of real-time stream processing, a Storm scheduling algorithm is proposed which is weighted sorting of available Slot and node load conditions and based on Storm-based distributed flow processing system to reduce load imbalance. And through designing the data structure reasonably, the paper designs the rowkey in Hbase randomly and evenly, which can ensure the load balance of the various RegionServer,improve the utilization of cluster resources and increase the speed of data writing greatly. Through the comparison experiment with the original Storm system, it is shown that the above algorithm improvement and mechanism optimization ensure the fast writing of data and improve the utilization rate of cluster resources. The improved system has obvious advantages in practicality and efficiency. 

Key words: Storm, streaming processing, distributed computing, batch processing, load balancing

中图分类号: