计算机与现代化

• 数据库与数据挖掘 •    下一篇

一种基于Apriori算法的优化挖掘算法

  

  1. (河海大学计算机与信息学院,江苏南京211100)
  • 收稿日期:2016-03-03 出版日期:2016-09-12 发布日期:2016-09-13
  • 作者简介:陈志飞(1993-),女,江苏盐城人,河海大学计算机与信息学院硕士研究生,研究方向:数据挖掘; 冯钧(1969-),女,教授,博士,研究方向:时空数据管理,智能数据处理与数据挖掘,水利信息化。
  • 基金资助:
    国家自然科学基金面上项目(61370091); 国家科技支撑计划资助项目(2015BAB07B00)

An Optimized Data Mining Algorithm Based on Apriori Algorithm

  1. (College of Computer and Information, Hohai University, Nanjing 211100, China)
  • Received:2016-03-03 Online:2016-09-12 Published:2016-09-13

摘要:

通过对关联规则挖掘基本问题的分析,总结经典挖掘算法Apriori的3点不足,针对不足进行相应改进: 1)改变数据库映射方法,避免反复扫描数据库; 2)确定非频繁项集,并确保其不与其它项连接,避免产生大量候选项; 3)采用交运算,解决候选项集与事物模式匹配阶段耗时过多的问题。此外,为了验证改进算法的有效性,采用水文历史数据进行实验验证。实验结果表明,在支持度与置信度取不同值时,本文提出的改进算法IMApriori算法执行时间更短,效率更高。

关键词: 数据挖掘, 关联规则, IMApriori算法, 改进, 水文数据

Abstract:

This paper studies the fundamental problems of mining association rules. Based on the summary of classical mining algorithms and the inherent defects of Apriori algorithm, some related improvements are researched. Firstly, in order to avoid scanning the database repeatedly, the paper proposes a new method changing the database mapping way. Secondly, with the support of candidate item sets got, each candidate item set should be determined whether it is a frequent item set or not based on the prior knowledge of Apriori algorithm. If the candidate item sets generated by the element of the existing frequent item sets are certainly not frequent item sets, the element is not necessary to connect with others, avoiding producing lots of candidate items, which leads to an optimized connecting step. Lastly, for Apriori algorithm, the intersection operation is introduced to address the problems that it costs too much time to match candidate item sets with transaction patterns. Furthermore, to verify the effectiveness, the optimized algorithm has been applied to the hydrological historical data. The results of the experiments show that it costs shorter execution time under different supports and confident levels, gaining higher efficiency.

Key words: data mining, association rules, IM-Apriori algorithm, improvement, hydrological data

中图分类号: