计算机与现代化 ›› 2022, Vol. 0 ›› Issue (01): 41-53.

• 数据库与数据挖掘 • 上一篇    下一篇

云环境大数据工作流编排管理系统研究综述

  

  1. (华北计算技术研究所基础五部,北京100083)
  • 出版日期:2022-01-24 发布日期:2022-01-24
  • 作者简介:曹禹(1991—),男,北京人,硕士研究生,研究方向:大数据分析挖掘,物联网应用,E-mail: caoyu711@qq.com; 李晓辉(1980—),女,研究员级高级工程师,博士,研究方向:大数据分析挖掘,物联网应用; 刘忠麟(1983—),男,高级工程师,硕士研究生,研究方向:大数据分析挖掘,云计算; 贾贺(1988—),女,工程师,硕士研究生,研究方向:大数据分析挖掘,机器学习; 费志伟(1996—),男,硕士研究生,研究方向:大数据分析挖掘,机器学习。
  • 基金资助:
    中电太极(集团)有限公司技术创新基金项目-交互式分析建模工具(19020103)

Review of Big Data Workflow Orchestration and Management System in Cloud Environment

  1. (The Fifth Elementary Department, North China Institute of Computing Technology, Beijing 100083, China)
  • Online:2022-01-24 Published:2022-01-24

摘要: 随着大数据分析处理需求日益复杂,分析处理过程的表达需要转变为依据任务以及任务间依赖关系构建的大数据工作流的形式,以实现其结构化、可重复、可控制、可扩展以及自动化执行,大数据工作流的编排管理成为重要的研究课题,云计算环境下资源的异构性使得该问题变得更为复杂。本文首先将云环境下大数据工作流编排管理研究划分为大数据工作流构建、工作流划分、任务调度与执行以及容错处理4个方面,并在此基础上进行综述,列举并介绍各个方面近年来经典的、关注度较高的研究;然后,针对研究中的主流技术进行分类与梳理,对各项研究中提出的方法及其特性、优势、待改进项等方面进行分析;最后,将视角回归至大数据分析处理系统,分类分析各项研究给系统带来的收益。

关键词: 大数据, 云计算, 数据分析, 工作流, 编排管理

Abstract: With the increasing complexity of big data analysis and processing  requirements, the expression of the analysis and processing process needs to be transformed into the form of a big data workflow constructed based on tasks and inter-task dependencies in order to achieve its structured, repeatable, controllable, scalable and automated execution. The issue of big data workflow orchestration and management has become an important research topic. The heterogeneity of resources in the cloud computing environment  has made this problem more complicated. This paper first divides the research contents on big data workflow orchestration and management in the cloud environment into four aspects, big data workflow composition, workflow fragmentation, task scheduling and execution, and fault tolerance, and on this basis, it reviews and introduces classic and highly concerned researches in recent years each aspect; then, it classifies and sorts out the mainstream technologies in these researches, and analyzes the methods proposed in each research and their characteristics, advantages, and items to be improved. Finally, the perspective is returned to the big data analysis and processing system, and the benefits of various studies to the system are classified and analyzed.

Key words: big data, cloud computing, data analysis, workflow, orchestration and management