一种基于Apriori算法的优化挖掘算法

doi:10.3969/j.issn.1006-2475.2016.09.001

计算机与现代化 ›› 2016, Vol. 0 ›› Issue (9): 1-5.doi: 10.3969/j.issn.1006-2475.2016.09.001

• 数据库与数据挖掘 • 下一篇

一种基于Apriori算法的优化挖掘算法

(河海大学计算机与信息学院，江苏南京211100)

收稿日期:2016-03-03 出版日期:2016-09-12 发布日期:2016-09-13
作者简介:陈志飞（1993-），女，江苏盐城人，河海大学计算机与信息学院硕士研究生，研究方向：数据挖掘；冯钧（1969-），女，教授，博士，研究方向：时空数据管理，智能数据处理与数据挖掘，水利信息化。
基金资助:
国家自然科学基金面上项目（61370091）；国家科技支撑计划资助项目（2015BAB07B00）

An Optimized Data Mining Algorithm Based on Apriori Algorithm

(College of Computer and Information, Hohai University, Nanjing 211100， China)

Received:2016-03-03 Online:2016-09-12 Published:2016-09-13

摘要/Abstract

摘要：

通过对关联规则挖掘基本问题的分析，总结经典挖掘算法Apriori的3点不足，针对不足进行相应改进： 1）改变数据库映射方法，避免反复扫描数据库； 2）确定非频繁项集，并确保其不与其它项连接，避免产生大量候选项； 3）采用交运算，解决候选项集与事物模式匹配阶段耗时过多的问题。此外，为了验证改进算法的有效性，采用水文历史数据进行实验验证。实验结果表明，在支持度与置信度取不同值时，本文提出的改进算法IMApriori算法执行时间更短，效率更高。

关键词: 数据挖掘, 关联规则, IMApriori算法, 改进, 水文数据

Abstract:

This paper studies the fundamental problems of mining association rules. Based on the summary of classical mining algorithms and the inherent defects of Apriori algorithm, some related improvements are researched. Firstly, in order to avoid scanning the database repeatedly, the paper proposes a new method changing the database mapping way. Secondly, with the support of candidate item sets got, each candidate item set should be determined whether it is a frequent item set or not based on the prior knowledge of Apriori algorithm. If the candidate item sets generated by the element of the existing frequent item sets are certainly not frequent item sets, the element is not necessary to connect with others, avoiding producing lots of candidate items, which leads to an optimized connecting step. Lastly, for Apriori algorithm, the intersection operation is introduced to address the problems that it costs too much time to match candidate item sets with transaction patterns. Furthermore, to verify the effectiveness, the optimized algorithm has been applied to the hydrological historical data. The results of the experiments show that it costs shorter execution time under different supports and confident levels, gaining higher efficiency.

Key words: data mining, association rules, IM-Apriori algorithm, improvement, hydrological data

中图分类号:

TP301.6

陈志飞，冯钧. 一种基于Apriori算法的优化挖掘算法[J]. 计算机与现代化, 2016, 0(9): 1-5.

CHEN Zhi-fei, FENG Jun. An Optimized Data Mining Algorithm Based on Apriori Algorithm[J]. Computer and Modernization, 2016, 0(9): 1-5.

参考文献

［1］胡文瑜,孙志挥,吴英杰. 数据挖掘取样方法研究［J］. 计算机研究与发展, 2011,48(1):45-54.
［2］王明星. 数据挖掘算法优化研究与应用［D］. 合肥:安徽大学, 2014.
［3］毛国君,段立娟,王实. 数据挖掘原理与算法［M］. 2版. 北京:清华大学出版社, 2007.
［4］杨柯,张建军. 基于计算期望和信誉度的网格资源调度模型［J］. 西北大学学报(自然科学版）, 2009,39(2):225-229.
［5］郭涛,张代远,吴英杰. 基于关联规则数据挖掘Apriori算法的研究与应用［J］. 计算机技术与发展, 2011,21(6):101-103.
［6］郭秀娟,张树彬,岳俊华. 基于Apriori数据挖掘算法研究［J］. 吉林建筑工程学院学报, 2010,27(3):57-60.
［7］刘华婷,郭仁祥,姜浩. 关联规则挖掘Apriori算法的研究与改进［J］. 计算机应用与软件, 2009,26(1):146-149.
［8］王爱平,王占凤,陶嗣干,等. 数据挖掘中常用关联规则挖掘算法［J］. 计算机技术与发展, 2010,20(4):105-108.
［9］毛国君. 数据挖掘技术与关联规则挖掘算法研究［D］. 北京:北京工业大学, 2003.
［10］申彦. 大规模数据集高效数据挖掘算法研究［D］. 镇江:江苏大学, 2013.
［11］Agrawal R, Imieliski T, Swami A. Mining association rules between sets of items in large databases［C］// Proceedings of the 1993 ACM SIGMOD International Conference on Management Data. 1993,22(2):207-216.
［12］Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases［C］// Proceedings of the 21st VLDB Conference. 1995.
［13］Toivonen H. Sampling large databases for association rules［C］// Proceedings of the 22th VLDB Conference.1996:134-145.
［14］Park J S, Chen M S, Yu P S. An effective hash based algorithm for mining association rules［C］//

Proceedings of the 1995 ACM SIGMOD International Conference on Management of data. 1995,24(2):175-186.
［15］Agrawal R, Srikant R. Fast algorithm for mining association rules［C］// Proceedings of the 20th International Conference on Very Large Data Bases(VLDB). 1994:487-499.
［16］Han Jiawei, Pei Jian, Yin Yiwen. Mining frequent patterns without candidate generation［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 2000,29(2):1-12.
［17］刘步中. 基于频繁项集挖掘算法的改进与研究［J］. 计算机应用研究, 2012,29(2):475-477
［18］符丽锦,覃华,邓海,等. 一种改进的Apriori算法［J］. 广西科学院学报, 2013,29(1):1-3.
［19］邵勇,陈波,史宝东,等. 关联规则的快速更新算法［J］. 计算机工程, 2009,35(19):62-64.
［20］唐璐,江红,上官秋子. 一种改进的关联规则的增量式更新算法［J］. 计算机应用与软件, 2012,29(4):246-248.

[1]	袁庆乐, 牟莉. 基于改进Elman神经网络的预测方法[J]. 计算机与现代化, 2024, 0(11): 28-33.
[2]	杨正科, 沈小东, 王凯翔, 何立. 基于改进麻雀搜索算法的接地网腐蚀故障定位[J]. 计算机与现代化, 2024, 0(10): 14-20.
[3]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[4]	秦阳, 詹勇, 明路遥, 杨舒淇, 蓝振祎. 基于改进K-means算法的通勤交通小区识别[J]. 计算机与现代化, 2024, 0(07): 63-68.
[5]	符灵利, 邱宇, 张新晨 . 基于改进U-Net多特征融合的血管分割#br#[J]. 计算机与现代化, 2024, 0(06): 76-82.
[6]	袁红伟1, 常利军1, 郝家欢2, 樊娜2, 王超2, 罗闯2, 张泽辉2. 基于标签传播的轨迹兴趣点挖掘及隐私保护[J]. 计算机与现代化, 2024, 0(05): 46-54.
[7]	宋涛涛, 李艳萍, 李洪港, 韩春雪. 基于改进变结构趋近律的机械臂滑模控制系统[J]. 计算机与现代化, 2023, 0(12): 14-18.
[8]	王凯翔, 杨静, 杨文, 米红菊, 甘飞. 基于改进GSA算法的多能源移动电源车优化配置[J]. 计算机与现代化, 2023, 0(12): 105-111.
[9]	王重阳, 庄毅. 基于SDN和改进CSA算法的多作业集群的负载均衡算法[J]. 计算机与现代化, 2023, 0(11): 28-35.
[10]	刘显茁, 邓韦斯, 谢恩彦. 考虑分布式发电并网的配电网自适应保护系统[J]. 计算机与现代化, 2023, 0(09): 120-126.
[11]	谢仕斌, 刘梦赤, 唐诗琪, 周瑞平, . 基于多特征提取的时间卷积知识追踪模型[J]. 计算机与现代化, 2023, 0(07): 25-29.
[12]	钟松影. 基于关联规则Apriori算法的纺织原料成本预警[J]. 计算机与现代化, 2023, 0(07): 43-43.
[13]	徐皓, 田振宇, 李超凡, 崔欣欣, 杨建兰. 基于ResNeXt和改进nnU-Net的新冠感染早期诊断方法[J]. 计算机与现代化, 2023, 0(06): 21-26.
[14]	刘佩. 基于数据挖掘的医保控费系统[J]. 计算机与现代化, 2023, 0(06): 89-94.
[15]	李雨晴, 文勇军, 曾小为, 唐立军, 周庆华, 张志刚. 微柱凝胶血型检测卡异物智能检测系统[J]. 计算机与现代化, 2023, 0(03): 6-10.

一种基于Apriori算法的优化挖掘算法

An Optimized Data Mining Algorithm Based on Apriori Algorithm

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价