一种Hadoop YARN的资源调度机制

doi:10.3969/j.issn.1006-2475.2017.11.006

计算机与现代化

一种Hadoop YARN的资源调度机制

(中国电子科技集团公司第三十二研究所信息服务平台室，上海 201808)

收稿日期:2017-05-31 出版日期:2017-11-21 发布日期:2017-11-21
作者简介:李程(1993-)，男，湖南浏阳人，中国电子科技集团公司第三十二研究所信息服务平台室硕士研究生，研究方向：大数据与云计算; 柴小丽(1968-)，女，所副总工程师，研究员级高级工程师，研究方向：计算机系统结构，嵌入式计算机，国产化计算机。

A Resource Scheduling Mechanism of Hadoop YARN

(Information Service Laboratory, No. 32nd Research Institute of China Electronics Technology Group Corporation, Shanghai 201808, China)

Received:2017-05-31 Online:2017-11-21 Published:2017-11-21

摘要/Abstract

摘要： YARN是Hadoop中广泛应用的资源管理系统，支持MapReduce, Spark, Storm等多种计算框架，已成为大数据生态中的核心组件。然而，在Hadoop YARN现有的资源调度器中，采用基于资源预留的资源保障机制，会产生资源碎片，导致资源浪费。为提高集群的资源利用率和吞吐量，本文提出一种基于预约回填的资源分配机制。在该机制中，基于作业的优先级来决定是否对资源进行预约，并引入回填策略，在不影响预约作业执行的情况下，对资源进行回填使用。实验表明，使用基于预约回填的资源调度机制能够有效提高Hadoop YARN集群的资源利用率和吞吐量。

关键词: Hadoop YARN, 大数据, 资源调度, 预约回填

Abstract: YARN is a resource management system widely used in Hadoop. It supports MapReduce, Spark, Storm and other computing frameworks, and has become the core component of big data ecology. However, in Hadoop YARN’s existing resource scheduler, a resource guarantee mechanism based on resource reservation, will produce resource fragmentations, leading to a waste of resources. In order to improve the resource utilization and throughput of the cluster, this paper proposes a resource allocation mechanism based on reservation and backfill. In this mechanism, based on the priority of the job, it decides whether to make a reservation to the resource and introduce a backfill strategy to backfill the resource without affecting the execution of the reservation job. Experiments show that the resource scheduling mechanism based on reserved backfill can effectively improve the resource utilization and throughput of Hadoop YARN cluster.

Key words: Hadoop YARN, big data, resource scheduler, reserved backfill

中图分类号:

TP302

李程，柴小丽，谢彬，唐鹏. 一种Hadoop YARN的资源调度机制[J]. 计算机与现代化, doi: 10.3969/j.issn.1006-2475.2017.11.006.

LI Cheng, CHAI Xiao-li, XIE Bin, TANG Peng. A Resource Scheduling Mechanism of Hadoop YARN[J]. Computer and Modernization, doi: 10.3969/j.issn.1006-2475.2017.11.006.

参考文献

[1] 刘正伟,文中领,张海涛. 云计算和云数据管理技术[J]. 计算机研究与发展, 2012,49(S1):26-31.

[2] 何清. 大数据与云计算[J]. 科技促进发展, 2014,10(1):35-40.

[3] 李成华,张新访,金海,等. MapReduce:新型的分布式并行计算编程模型[J]. 计算机工程与科学, 2011,33(3):129-135.

[4] 郭敏杰. 大数据和云计算平台应用研究[J]. 现代电信科技, 2014(8):7-11.

[5] 安思华. 基于Hadoop平台的作业调度算法研究与改进[D]. 上海:上海交通大学, 2015.

[6] Yang Wenjie, Liu Xingang, Zhang Lan, et al. Big data real-time processing based on storm[C]// Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. 2013:1784-1787.

[7] Zaharia M, Chowdhury M, Franklin M J, et al. Spark: Cluster computing with working sets[C]// Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. 2010: Article No. 10.

[8] Saha B, Shah H, Seth S, et al. Apache Tez: A unifying framework for modeling and building data processing applications[C]// Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 2015:1357-1369.

[9] 董春涛,李文婷,沈晴霓,等. Hadoop YARN大数据计算框架及其资源调度机制研究[J]. 信息通信技术, 2015(1):77-84.

[10] 董西成. Hadoop技术内幕:深入解析YARN架构设计与实现原理[M]. 北京:机械工业出版社, 2013.

[11] The Apache Software Foundation. Apache Hadoop[EB/OL]. http://hadoop.apache.org, 2017-03-07.

[12] 于金良,朱志祥,李聪颖. Hadoop MapReduce新旧架构的对比研究综述[J]. 计算机与数字工程, 2017,45(1):83-87.

[13] Yang M. 深入理解Hadoop YARN中的Container概念[DB/OL]. http://blog.csdn.net/macyang/article/details/17489451, 2013-12-22.

[14] 梁毅,孟丹,樊建平. RB-FIFT——一种结合Firstfit及预约回填策略的机群作业调度算法[J]. 计算机研究与发展, 2004,41(11):1902-1910.

[15] Friedman E J, Henderson S G. Fairness and efficiency in Web server protocols[C]// Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 2003:229-237.

[16] 李媛祯,杨群,赖尚琦,等. 一种Hadoop YARN的资源调度方法研究[J]. 电子学报, 2016,44(5):1017-1024.

[17] 詹文涛,艾中良,刘忠麟,等. 一种基于YARN的高优先级作业调度实现方案[J]. 软件, 2016,37(3):84-88.

[18] 梁成升. Linux集群环境下作业调度算法的研究与实现[D]. 成都:电子科技大学, 2011.

一种Hadoop YARN的资源调度机制

A Resource Scheduling Mechanism of Hadoop YARN

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	周明升, 张雯. 一种面向多源数据的智慧园区管理平台[J]. 计算机与现代化, 2023, 0(05): 68-74.
[2]	邱金水, 庄会富, 金涛. 面向海量植物图像的智能检索系统设计[J]. 计算机与现代化, 2022, 0(10): 62-67.
[3]	单珂, 张一鸣, 刘瑞霞, . 面向中原城市群的科技服务资源池研究与设计[J]. 计算机与现代化, 2022, 0(07): 91-96.
[4]	黄安琪, 苗放, 杨文晖, 倪雅婷, 蒋媛. 基于数据架构的结构化数据注册引擎设计[J]. 计算机与现代化, 2022, 0(05): 82-89.
[5]	曹禹, 李晓辉, 刘忠麟, 贾贺, 费志伟. 云环境大数据工作流编排管理系统研究综述[J]. 计算机与现代化, 2022, 0(01): 41-53.
[6]	张小芳, 冯慧芳. 基于轨迹大数据的动态最优路径规划[J]. 计算机与现代化, 2021, 0(11): 82-88.
[7]	李明, 陈积富, 易小荣, 刘书铭. 基于JFinal框架的洞庭湖环境监测系统[J]. 计算机与现代化, 2021, 0(10): 41-48.
[8]	魏云东. 基于大数据技术的人才智能推荐方法[J]. 计算机与现代化, 2021, 0(07): 60-64.
[9]	雷鸣, 姜罕盛, 武国良, 赵玉娟, 梁健. 基于HBase的大数据架构下负载平衡技术[J]. 计算机与现代化, 2021, 0(06): 91-95.
[10]	刘锋, 邹臣嵩, 崔炜. 大数据环境下基于K中心点优化算法的Web服务组合[J]. 计算机与现代化, 2020, 0(12): 20-24.
[11]	潘卫军, 刘皓晨, 王润东, 胡博文. 基于ANN的改进Spark系统在空管大数据处理中的应用[J]. 计算机与现代化, 2020, 0(12): 78-82.
[12]	周贤来. 基于语义分割的异构多核平台大数据挖掘算法[J]. 计算机与现代化, 2020, 0(10): 40-43.
[13]	吴海伟, 王晓忠, 朱法顺, . 一种基于遗传算法的智能电网调度方法[J]. 计算机与现代化, 2020, 0(09): 122-126.
[14]	徐建鹏, 张辉, 伍琼, 王晖, 汪兵. 安徽气象为农服务大数据平台设计与应用[J]. 计算机与现代化, 2020, 0(08): 105-108.
[15]	刘张榕. 基于大数据的半分布式僵尸网络动态抑制算法[J]. 计算机与现代化, 2020, 0(08): 109-113.