Hadoop平台上Apriori算法并行化研究与实现

计算机与现代化 ›› 2013, Vol. 1 ›› Issue (3): 1-4,8.doi:

• 算法分析与设计 • 下一篇

Hadoop平台上Apriori算法并行化研究与实现

郝晓飞1，谭跃生2，王静宇2

1.内蒙古科技大学信息工程学院,内蒙古包头014010；2.内蒙古科技大学网络中心,内蒙古包头014010

收稿日期:2012-11-08 修回日期:1900-01-01 出版日期:2013-04-03 发布日期:2013-04-03

Research and Implementation of Parallel Apriori Algorithm on Hadoop Platform

HAO Xiao-fei1, TAN Yue-sheng2, WANG Jing-yu2

1. College of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China； 2. Network Center, Inner Mongolia University of Science and Technology, Baotou 014010, China

Received:2012-11-08 Revised:1900-01-01 Online:2013-04-03 Published:2013-04-03

摘要/Abstract

摘要： 分析传统串行关联规则Apriori算法的计算过程以及存在的一些缺点，针对串行算法执行效率低，时间复杂度高以及传统并行计算模式不能处理节点失效，难以处理负载均衡等问题，提出基于Hadoop平台实现并行关联规则算法的设计方法，对传统关联规则Apriori算法进行了改进，并给出改进算法在Hadoop平台的MapReduce编程模型上的执行流程；在Hadoop平台上对改进后的算法进行单机测试和集群测试，实验结果证明，改进后的算法具有较高的执行效率，良好的加速比和可移植性。

关键词: Hadoop, 关联规则算法, 并行计算, Apriori

Abstract: The traditional association rule Apriori algorithm and its defect are analyzed, on account of the serial algorithm are lower efficiency, high time complexity and the traditional parallel computing can not deal with node failure, it is also difficult to deal with issues such as load balancing, the parallel association rule algorithm based on the Hadoop platform is proposed, the traditional association rule Apriori algorithm has been improved and the implementation process of the improved algorithm based on the MapReduce programming model is given; the improved algorithm is tested on a single computer and clusters, experimental results show that the improved algorithm has a higher efficiency, better speedup and portability.

Key words: Hadoop, association rule algorithm, parallel computing, Apriori

郝晓飞;谭跃生;王静宇. Hadoop平台上Apriori算法并行化研究与实现[J]. 计算机与现代化, 2013, 1(3): 1-4,8.

HAO Xiao-fei;TAN Yue-sheng;WANG Jing-yu. Research and Implementation of Parallel Apriori Algorithm on Hadoop Platform[J]. Computer and Modernization, 2013, 1(3): 1-4,8.

[1]	钟松影. 基于关联规则Apriori算法的纺织原料成本预警[J]. 计算机与现代化, 2023, 0(07): 43-43.
[2]	张军, 苏文浩 . 基于LZO的Hadoop文件归档优化方法[J]. 计算机与现代化, 2023, 0(06): 1-6.
[3]	王劭华, 欧阳会丹, 孙丹, 王康, 吴鸿萍, 钟询, 褚兴平, 杨松涛. 基于Apriori算法的大学生体测项目关联规则挖掘[J]. 计算机与现代化, 2023, 0(03): 66-70.
[4]	王鑫, 吴俊辉, . 基于FPGA的分子动力学模拟交互控制系统[J]. 计算机与现代化, 2022, 0(09): 25-31.
[5]	郭欣, 陈瑛, 章鸣嬛, 张璇, 潘曙明, 汤璐佳. 利用机器学习方法对灾难生命支持课程NDLS培训效果进行分析预测#br#[J]. 计算机与现代化, 2020, 0(12): 61-66.
[6]	冯云霞, 韩正亮, 薛蓉蓉, 宋波. 心血管疾病并发症与虚弱症关联模式研究[J]. 计算机与现代化, 2020, 0(07): 85-89.
[7]	苏林萍，安然，李为，崔文超，张晓良. 基于Hadoop的电力运维审计系统的设计[J]. 计算机与现代化, 2020, 0(01): 49-.
[8]	王云，李丛. 基于改进关联规则算法的警情数据分析[J]. 计算机与现代化, 2019, 0(12): 1-.
[9]	石慧1，陈恩2. Spark平台的分布式阶段自适应关联规则挖掘算法[J]. 计算机与现代化, 2019, 0(12): 31-.
[10]	项武铭，李雪巍. 基于CUDA的梯级泵站调度算法实现[J]. 计算机与现代化, 2018, 0(11): 60-.
[11]	周天绮. 基于移动通信大数据的城市人口空间分布统计[J]. 计算机与现代化, 2018, 0(05): 45-.
[12]	赵博颖，肖鹏，张力. 基于Docker的MPI和OpenMP混合编程[J]. 计算机与现代化, 2018, 0(05): 60-.
[13]	桑喆1，邓川1，苟聪1，刘开兴2，白明泽1. 基于JNI和C+〖KG-*3〗+的Intel集成众核并行方法[J]. 计算机与现代化, 2018, 0(04): 32-.
[14]	温贺平1,鲍晶晶2,柯居鑫1,刘树威1. 基于Hadoop的Lorenz超混沌加密算法设计[J]. 计算机与现代化, 2018, 0(03): 108-.
[15]	邬可可,黄国伟,孔令晶. 一种灵活的椭圆曲线密码并行化方法[J]. 计算机与现代化, 2018, 0(02): 71-.

Hadoop平台上Apriori算法并行化研究与实现

Research and Implementation of Parallel Apriori Algorithm on Hadoop Platform

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价