计算机与现代化 ›› 2013, Vol. 218 ›› Issue (10): 1-5.doi: 10.3969/j.issn.1006-2475.2013.10.001

• 算法设计与分析 •    下一篇

基于Apriori的Web访问模式挖掘算法

刘美玲1,2,苏毅娟2,3   

  1. 1.广西民族大学信息科学与工程学院,广西南宁530006;2.广西师范学院科学计算与智能信息处理广西高校重点实验室,广西南宁530023;3.广西师范学院计算机与信息工程学院,广西南宁530023
  • 收稿日期:2013-05-24 修回日期:1900-01-01 出版日期:2013-10-26 发布日期:2013-10-26

Apriori-based Web Traversal Pattern Mining Algorithm

LIU Mei-ling 1,2, SU Yi-juan 2,3   

  1. 1. College of Information Science and Engineering, Guangxi University for Nationalities, Nanning 530006, China;2. Key Laboratory of Science Computing and Intelligent Information Processing in Universities of Guangxi, Guangxi Teachers Education University, Nanning 530023, China;3. College of Computer and Information Engineering, Guangxi Teachers Education University, Nanning 530023, China
  • Received:2013-05-24 Revised:1900-01-01 Online:2013-10-26 Published:2013-10-26

摘要: 简要介绍Apriori算法与Web访问路径的有向图表示方法,提出一种从Web日志文件中得到频繁访问模式的算法,该算法以Apriori算法为基础,并以访问路径序列的有序性特点作为候选集的剪枝策略,减少候选集的生成,提高算法的效率。在真实数据集和模拟数据集上分别进行实验,实验结果表明该算法是有效的,且适应性好。

关键词: WFTP算法, Web日志文件, 数据挖掘, 频繁访问路径, 有序访问路径

Abstract: The Apriori algorithm and the directed graph representation method for Web traversal paths are briefly introduced, and an algorithm based on Apriori is proposed for generating frequent traversal patterns from Web log files. The proposed algorithm uses the orderliness of the traversal paths as pruning strategy of candidate set, thus it can decrease the scale of candidate sets and improve efficiency. Some experiments are conducted with real datasets and simulated datasets, and the experimental results show the effectiveness and good adaptability of the proposed algorithm.

Key words: WFTP algorithm, Web log file, data mining, frequently traversed path, sequential traversed path

中图分类号: