计算机与现代化 ›› 2013, Vol. 1 ›› Issue (3): 1-4,8.doi:

• 算法分析与设计 •    下一篇

Hadoop平台上Apriori算法并行化研究与实现

郝晓飞1,谭跃生2,王静宇2   

  1. 1.内蒙古科技大学信息工程学院,内蒙古包头014010;2.内蒙古科技大学网络中心,内蒙古包头014010
  • 收稿日期:2012-11-08 修回日期:1900-01-01 出版日期:2013-04-03 发布日期:2013-04-03

Research and Implementation of Parallel Apriori Algorithm on Hadoop Platform

HAO Xiao-fei1, TAN Yue-sheng2, WANG Jing-yu2   

  1. 1. College of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China; 2. Network Center, Inner Mongolia University of Science and Technology, Baotou 014010, China
  • Received:2012-11-08 Revised:1900-01-01 Online:2013-04-03 Published:2013-04-03

摘要: 分析传统串行关联规则Apriori算法的计算过程以及存在的一些缺点,针对串行算法执行效率低,时间复杂度高以及传统并行计算模式不能处理节点失效,难以处理负载均衡等问题,提出基于Hadoop平台实现并行关联规则算法的设计方法,对传统关联规则Apriori算法进行了改进,并给出改进算法在Hadoop平台的MapReduce编程模型上的执行流程;在Hadoop平台上对改进后的算法进行单机测试和集群测试,实验结果证明,改进后的算法具有较高的执行效率,良好的加速比和可移植性。

关键词: Hadoop, 关联规则算法, 并行计算, Apriori

Abstract: The traditional association rule Apriori algorithm and its defect are analyzed, on account of the serial algorithm are lower efficiency, high time complexity and the traditional parallel computing can not deal with node failure, it is also difficult to deal with issues such as load balancing, the parallel association rule algorithm based on the Hadoop platform is proposed, the traditional association rule Apriori algorithm has been improved and the implementation process of the improved algorithm based on the MapReduce programming model is given; the improved algorithm is tested on a single computer and clusters, experimental results show that the improved algorithm has a higher efficiency, better speedup and portability.

Key words: Hadoop, association rule algorithm, parallel computing, Apriori