计算机与现代化 ›› 2022, Vol. 0 ›› Issue (01): 41-53.
出版日期:
2022-01-24
发布日期:
2022-01-24
作者简介:
曹禹(1991—),男,北京人,硕士研究生,研究方向:大数据分析挖掘,物联网应用,E-mail: caoyu711@qq.com; 李晓辉(1980—),女,研究员级高级工程师,博士,研究方向:大数据分析挖掘,物联网应用; 刘忠麟(1983—),男,高级工程师,硕士研究生,研究方向:大数据分析挖掘,云计算; 贾贺(1988—),女,工程师,硕士研究生,研究方向:大数据分析挖掘,机器学习; 费志伟(1996—),男,硕士研究生,研究方向:大数据分析挖掘,机器学习。
基金资助:
Online:
2022-01-24
Published:
2022-01-24
摘要: 随着大数据分析处理需求日益复杂,分析处理过程的表达需要转变为依据任务以及任务间依赖关系构建的大数据工作流的形式,以实现其结构化、可重复、可控制、可扩展以及自动化执行,大数据工作流的编排管理成为重要的研究课题,云计算环境下资源的异构性使得该问题变得更为复杂。本文首先将云环境下大数据工作流编排管理研究划分为大数据工作流构建、工作流划分、任务调度与执行以及容错处理4个方面,并在此基础上进行综述,列举并介绍各个方面近年来经典的、关注度较高的研究;然后,针对研究中的主流技术进行分类与梳理,对各项研究中提出的方法及其特性、优势、待改进项等方面进行分析;最后,将视角回归至大数据分析处理系统,分类分析各项研究给系统带来的收益。
曹禹, 李晓辉, 刘忠麟, 贾贺, 费志伟. 云环境大数据工作流编排管理系统研究综述[J]. 计算机与现代化, 2022, 0(01): 41-53.
CAO Yu, LI Xiao-hui, LIU Zhong-lin, JIA He, FEI Zhi-wei. Review of Big Data Workflow Orchestration and Management System in Cloud Environment[J]. Computer and Modernization, 2022, 0(01): 41-53.
[1] LIU J, PACITTI E, VALDURIEZ P, et al. A survey of data-intensive scientific workflow management[J]. Journal of Grid Computing, 2015,13(4):457-493. [2] PANDEY S, BUYYA R. A survey of scheduling and management techniques for data-intensive application workflows[M]// Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management. 2012:156-176. [3] 田倬璟,黄震春,张益农. 云计算环境任务调度方法研究综述[J]. 计算机工程与应用, 2021,57(2):1-11. [4] ADHIKARI M, AMGOTH T, SRIRAMA S N. A survey on scheduling strategies for workflows in cloud environment and emerging trends[J]. ACM Computing Surveys, 2020,52(4):68.1-68.36. [5] KAUR S, BAGGA P, HANS R, et al. Quality of service (QoS) aware workflow scheduling (WFS) in cloud computing: A systematic review[J]. Arabian Journal for Science and Engineering, 2019,44(4):2867-2897. [6] POOLA D, SALEHI M A, RAMAMOHANARAO K, et al. Chapter 15: A taxonomy and survey of fault-tolerant workflow management systems in cloud and distributed computing environments[M]// Software Architecture for Big Data and the Cloud. 2017:285-320. [7] LIU J, PACITTI E, VALDURIEZ P. A survey of scheduling frameworks in big data systems[J]. International Journal of Cloud Computing, 2018,7(2):103-128. [8] RANJAN R, GARG S, KHOSKBAR A R, et al. Orchestrating bigdata analysis workflows[J]. IEEE Cloud Computing, 2017,4(3):20-28. [9] BARIKA M, GARG S, ZOMAYA A Y, et al. Orchestrating big data analysis workflows in the cloud: Research challenges, survey, and future directions[J]. ACM Computing Surveys, 2020,52(5):95.1-95.41. [10]AMSTUTZ P, CRUSOE M R, TIJANIC N, et al. Common Workflow Language, v1.0[S]. 2016. [11]VAN DER AALST W M P, TER HOFSTEDE A H M. YAWL: Yet another workflow language[J]. Information Systems, 2005,30(4):245-275. [12]ADAMS M, HENSE A V, TER HOFSTEDE A H M. YAWL: An open source business process management system from science for science[J]. SoftwareX, 2020,12. DOI:10.1016/j.softx.2020.100576. [13]GWL. A Workflow Management Language Extension for GNU Guix[EB/OL]. [2021-07-23]. https://www.guixwl.org/. [14]CLOUDSLANG. Orchestration as Code[EB/OL]. [2021-07-24]. https://cloudslang-docs.readthedocs.io/en/latest/cloudslang/cloudslang_dsl_reference.html. [15]BRANDT J, BUX M, LESER U. Cuneiform: A functional language for large scale scientific data analysis[C]//2015 EDBT /ICDT Workshops. 2015:7-16. [16]BRANDT J, REISIG W, LESER U. Computation semantics of the functional scientific workflow language Cuneiform[J]. Journal of Functional Programming, 2017,27. DOI:10.1017/S0956796817000119. [17]AHMAD S G, LIEW C S, RAFIQUE M M, et al. Data-intensive workflow optimization based on application task graph partitioning in heterogeneous computing systems[C]// 2014 IEEE 4th International Conference on Big Data and Cloud Computing. 2014:129-136. [18]AHMAD S G, LIEW C S, RAFIQUE M M, et al. Optimization of data-intensive workflows in stream-based data processing models[J]. The Journal of Supercomputing, 2017,73(9):3901-3923. [19]ZHANG J H, CHEN J, ZHAN J, et al. Graph partition-based data and task co-scheduling of scientific workflow in geo-distributed datacenters[J]. Concurrency and Computation: Practice and Experience, 2019,31(24). DOI:10.1002/cpe.5245. [20]LI C L, TANG J H, MA T, et al. Load balance based workflow job scheduling algorithm in distributed cloud[J]. Journal of Network and Computer Applications, 2020,152. DOI:10.1016/j.jnca.2019.102518. [21]LI C L, ZHANG Y H, HAO Z Q, et al. An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters[J]. Computer Networks, 2020,170. DOI:10.1016/j.comnet.2020.107096. [22]NIU M, CHENG B, CHEN J L. GTAA: A geo-aware task allocation approach in cloud workflow[C]// 2019 IEEE International Conference on Web Services (ICWS). 2019:449-451. [23]ADHIKARI M, AMGOTH T. An intelligent water drops-based workflow scheduling for IaaS cloud[J]. Applied Soft Computing, 2019,77:547-566. [24]ALKHANAK E N, LEE S P. A hyper-heuristic cost optimisation approach for scientific workflow scheduling in cloud computing[J]. Future Generation Computer Systems, 2018,86:480-506. [25]ZHOU X M, ZHANG G X, SUN J, et al. Minimizing cost and makespan for workflow scheduling in cloud using fuzzy dominance sort based HEFT[J]. Future Generation Computer Systems, 2019,93:278-289. [26]CHEN Z G, ZHAN Z H, LIN Y, et al. Multiobjective cloud workflow scheduling: A multiple populations ant colony system approach[J]. IEEE Transactions on Cybernetics, 2019,49(8):2912-2926. [27]CHOUDHARY A, GUPTA I, SINGH V, et al. A GSA based hybrid algorithm for bi-objective workflow scheduling in cloud computing[J]. Future Generation Computer Systems, 2018,83:14-26. [28]XIE Y, ZHU Y W, WANG Y G, et al. A novel directional and non-local-convergent particle swarm optimization based workflow scheduling in cloud-edge environment[J]. Future Generation Computer Systems, 2019,97:361-378. [29]ELSHERBINY S, ELDAYDAMONY E, ALRAHMAWY M, et al. An extended intelligent water drops algorithm for workflow scheduling in cloud computing environment[J]. Egyptian Informatics Journal, 2018,19(1):33-55. [30]ANWAR N, DENG H F. A hybrid metaheuristic for multi-objective scientific workflow scheduling in a cloud environment[J]. Applied Sciences, 2018,8(4). DOI:10.3390/app8040538. [31]ARABNEJAD H, BARBOSA J G. List scheduling algorithm for heterogeneous systems by an optimistic cost table[J]. IEEE Transactions on Parallel and Distributed Systems, 2014,25(3):682-694. [32]CHENG M Y, PRAYOGO D. Symbiotic organisms search: A new metaheuristic optimization algorithm[J]. Computers & Structures, 2014,139:98-112. [33]WANG Z J, ZHAN Z H, YU W J, et al. Dynamic group learning distributed particle swarm optimization for large-scale optimization and its application in cloud workflow scheduling[J]. IEEE Transactions on Cybernetics, 2020,50(6):2715-2729. [34]SINGH V, GUPTA I, JANA P K. An energy efficient algorithm for workflow scheduling in IaaS cloud[J]. Journal of Grid Computing, 2020,18(3):357-376. [35]MANASRAH A M, BA ALI H. Workflow scheduling using hybrid GA-PSO algorithm in cloud computing[J]. Wireless Communications and Mobile Computing, 2018. DOI: 10.1155/2018/1934784. [36]SAEEDI S, KHORSAND R, BIDGOLI S G, et al. Improved many-objective particle swarm optimization algorithm for scientific workflow scheduling in cloud computing[J]. Computers & Industrial Engineering, 2020,147.DOI:10.1016/j.cie.2020.106649. [37]ISMAYILOV G, TOPCUOGLU H R. Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing[J]. Future Generation Computer Systems, 2020,102:307-322. [38]DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002,6(2):182-197. [39]WU H, HUA X Y, LI Z, et al. Resource and instance hour minimization for deadline constrained DAG applications using computer clouds[J]. IEEE Transactions on Parallel and Distributed Systems, 2016,27(3):885-899. [40]THENNARASU S R, SELVAM M, SRIHARI K. A new whale optimizer for workflow scheduling in cloud computing environment[J]. Journal of Ambient Intelligence and Humanized Computing, 2021,12(3):3807-3814. [41]KASHLEV A, LU S Y. A system architecture for running big data workflows in the cloud[C]// 2014 IEEE International Conference on Services Computing. 2014:51-58. [42]KASHLEV A, LU S Y, MOHAN A. Big data workflows: A reference architecture and the dataview system[J]. Services Transactions on Big Data (STBD), 2017,4(1):1-19. [43]DESSALK Y D, NIKOLOV N, MATSKIN M, et al. Scalable execution of big data workflows using software containers[C]// The 12th International Conference on Management of Digital EcoSystems. 2020:76-83. [44]BARIKA M, GARG S, ZOMAYA A, et al. Online scheduling technique to handle data velocity changes in stream workflows[J]. IEEE Transactions on Parallel and Distributed Systems, 2021,32(8):2115-2130. [45]BARIKA M, GARG S, CHAN A, et al. Scheduling algorithms for efficient execution of stream workflow applications in multicloud environments[J]. IEEE Transactions on Services Computing, 2019. DOI: 10.1109/TSC.2019.2963382.〖HJ1.42mm〗 [46]ABAZARI F, ANALOUI M, TAKABI H, et al. MOWS: Multi-objective workflow scheduling in cloud computing based on heuristic algorithm[J]. Simulation Modelling Practice and Theory, 2019,93:119-132. [47]WANG Y W, GUO Y F, GUO Z H, et al. CLOSURE: A cloud scientific workflow scheduling algorithm based on attack-defense game model[J]. Future Generation Computer Systems, 2020,111:460-474. [48]XU X L, MO R C, DAI F, et al. Dynamic resource provisioning with fault tolerance for data-intensive meteorological workflows in cloud[J]. IEEE Transactions on Industrial Informatics, 2020,16(9):6172-6181. [49]KHALDI M, REBBAH M, MEFTAH B, et al. Fault tolerance for a scientific workflow system in a cloud computing environment[J]. International Journal of Computers and Applications, 2020,42(7):705-714. [50]XIE G Q, ZENG G, LI R F, et al. Quantitative fault-tolerance for reliable workflows on heterogeneous IaaS clouds[J]. IEEE Transactions on Cloud Computing, 2020,8(4):1223-1236. [51]YAO G S, DING Y S, HAO K R. Using imbalance characteristic for fault-tolerant workflow scheduling in cloud systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2017,28(12):3671-3683. [52]DING Y S, YAO G S, HAO K R. Fault-tolerant elastic scheduling algorithm for workflow in cloud systems[J]. Information Sciences, 2017,393:47-65. [53]ALAEI M, KHORSAND R, RAMEZANPOUR M. An adaptive fault detector strategy for scientific workflow scheduling based on improved differential evolution algorithm in cloud[J]. Applied Soft Computing, 2021,99. DOI: 10.1016/j.asoc.2020.106895. [54]CONTAINERS AT GOOGLE[EB/OL]. [2021-07-24]. https://cloud.google.com/containers/. [55]PINEDA-MORALES L, COSTAN A, ANTONIU G. Towards multi-site metadata management for geographically distributed cloud workflows[C]// 2015 IEEE International Conference on Cluster Computing. 2015:294-303. [56]PINEDA-MORALES L, LIU J, COSTAN A, et al. Managing hot metadata for scientific workflows on multisite clouds[C]// 2016 IEEE International Conference on Big Data. 2016:390-397. [57]LIU J, PINEDA-MORALES L, PACITTI E, et al. Efficient scheduling of scientific workflows using hot metadata in a multisite cloud[J]. IEEE Transactions on Knowledge and Data Engineering, 2019,31(10):1940-1953. [58]LIU X F, ZHAN Z H, ZHANG J. Neural network for change direction prediction in dynamic optimization[J]. IEEE Access, 2018,6:72649-72662. [59]JIANG M, HU W Z, QIU L M, et al. Solving dynamic multi-objective optimization problems via support vector machine[C]// 2018 10th International Conference on Advanced Computational Intelligence. 2018:819-824. |
[1] | 邱 玲1, 2, 宋 智1, 2, 吕 爽1, 2, 杨 雪1, 2. 数据同步技术在气象大数据云平台对外服务中的应用[J]. 计算机与现代化, 2024, 0(07): 76-81. |
[2] | 熊卿智1, 李 祥1, 2, 彭芳伟1, 金安安1. 基于数据驱动的离子源数据智能分析平台[J]. 计算机与现代化, 2024, 0(02): 121-126. |
[3] | 何玉鹏, 陶 勇, 王必恒, 赵英男. 智能配电网边缘计算研究现状与展望[J]. 计算机与现代化, 2023, 0(08): 87-92. |
[4] | 杨 波, 徐胜超. 基于SRv6服务链的云网专线场景安全防护方法[J]. 计算机与现代化, 2023, 0(08): 107-111. |
[5] | 周明升, 张雯. 一种面向多源数据的智慧园区管理平台[J]. 计算机与现代化, 2023, 0(05): 68-74. |
[6] | 毛明扬, 徐胜超. 可信赖云计算的通信终端攻击行为识别算法[J]. 计算机与现代化, 2022, 0(11): 37-42. |
[7] | 邱金水, 庄会富, 金涛. 面向海量植物图像的智能检索系统设计[J]. 计算机与现代化, 2022, 0(10): 62-67. |
[8] | 单珂, 张一鸣, 刘瑞霞, . 面向中原城市群的科技服务资源池研究与设计[J]. 计算机与现代化, 2022, 0(07): 91-96. |
[9] | 黄安琪, 苗放, 杨文晖, 倪雅婷, 蒋媛. 基于数据架构的结构化数据注册引擎设计[J]. 计算机与现代化, 2022, 0(05): 82-89. |
[10] | 李乾仕, 王淑营, 曾文驱. 柔性工作流路径变更研究与应用[J]. 计算机与现代化, 2021, 0(11): 44-49. |
[11] | 张小芳, 冯慧芳. 基于轨迹大数据的动态最优路径规划[J]. 计算机与现代化, 2021, 0(11): 82-88. |
[12] | 李明, 陈积富, 易小荣, 刘书铭. 基于JFinal框架的洞庭湖环境监测系统[J]. 计算机与现代化, 2021, 0(10): 41-48. |
[13] | 张晓敏. 基于布隆过滤器属性基的多关键词可搜索方案[J]. 计算机与现代化, 2021, 0(08): 104-111. |
[14] | 邓斌涛, 徐胜超. 基于动态双子种群的差分进化K中心点聚类算法[J]. 计算机与现代化, 2021, 0(07): 54-59. |
[15] | 魏云东. 基于大数据技术的人才智能推荐方法[J]. 计算机与现代化, 2021, 0(07): 60-64. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||