Computer and Modernization

Previous Articles     Next Articles

Distributed Stage Adaptive Association Rules Mining Algorithm Based on Spark

  

  1. (1. Department of Information Engineering, Shanwei Polytechnic, Shanwei 516600, China;
    2. Huawei Technologies Co., Ltd., Shenzhen 518129, China)
  • Received:2019-05-08 Online:2019-12-11 Published:2019-12-11

Abstract: In order to meet the growing demand for massive data mining, it is urgent to design a distributed association rule mining algorithm that can run on multiple machines. Apriori is a highly iterative algorithm that performs a large number of disk I/O operations per iteration when running on the Hadoop platform, greatly affecting and limiting the efficiency of the algorithm. This paper uses Spark to support the characteristics of memory distribution calculation and designs and implements a distributed association rule mining algorithm called Staged Adaptive Apriori on the Spark platform. The algorithm uses the adaptive data set partial processing strategy to efficiently mine frequent itemsets. The algorithm initially evaluates the execution time before each iteration, and adopts a more appropriate method to reduce the complexity of time and space. It is an adaptive association rule mining algorithm based on the nature of data sets. The experimental results demonstrate the effectiveness of the algorithm.

Key words: association rule mining, Apriori, MapReduce, Spark

CLC Number: