计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

一种改进型的分布式Lazy关联分类算法

  

  1. 重庆大学计算机学院,重庆400044
  • 收稿日期:2015-03-16 出版日期:2015-08-08 发布日期:2015-08-19
  • 作者简介:杨浩敏(1990-),男,广东揭阳人,重庆大学计算机学院硕士研究生,研究方向:数据挖掘,大数据处理; 马超(1991-),男,硕士研究生,研究方向:数据挖掘,大数据处理; 吴海燕(1989-),女,硕士研究生,研究方向:数据挖掘。

An Improved Distributed Lazy Associative Classification Algorithm

  1. College of Computer Science, Chongqing University, Chongqing 400044, China
  • Received:2015-03-16 Online:2015-08-08 Published:2015-08-19

摘要: 分布式lazy关联分类算法(DLAC算法)指应用分布式关联规则挖掘算法的lazy关联分类算法。现有的DLAC算法存在2个主要问题:一是对多个待分类样本进行分类时效率低下;二是投影操作未分布式实现。针对上述2个问题,提出一种改进型的分布式lazy关联分类(PDLAC)算法。首先,对待分类样本进行KMeans聚类;其次,判断类内的待分类样本是否满足聚合条件,满足进行聚合,不满足则类内的每个待分类样本单独成为一类;然后,进行分布式投影并使用C-DMA算法挖掘关联规则;最后,构建分类器对类内的一个或多个待分类样本进行分类。设置并行度为15进行实验:PDLAC算法所用的时间远小于DLAC算法,并且随着待分类样本数目的增加,性能提升越大。实验结果表明,PDLAC算法是解决上述2个问题的一个好方法。

关键词: 聚合方法, 分布式投影, 分布式关联规则挖掘, lazy方法, 关联分类

Abstract: Distributed lazy associative classification algorithm (DLAC) refers to a lazy associative classification algorithm using distributed association rules mining. The existing DLAC algorithm has two main problems: one is the inefficiency of classifying multiple test samples; the other is that projection operation is not distributed. Hence, this paper proposed an improved distributed lazy associative classification algorithm—PDLAC algorithm. Firstly, it clustered the test samples using KMeans method, secondly, judged whether it satisfied the aggregating condition or not for each clustered test samples, if it satisfied, aggregated the clustered test samples, if not, let each of the clustered test samples to be one clustered test sample. Then, it executed distributed projection and mined association rules using C-DMA algorithm. Finally, it constructed classifier to classify one or more test sample at the same time. Experiments were conducted with setting the degree of parallelism to 15. The time consumption of PDLAC algorithm was far less than DLAC algorithm, and its performance was much better as the number of testing samples increased. The test results show that PDLAC algorithm is a good solution to both two problems mentioned above.

Key words: aggregate method, distributed projection, distributed associative rules mining, lazy method, associative classification

中图分类号: