Computer and Modernization

Previous Articles     Next Articles

Improved K-means Clustering Algorithm Based on MapReduce Framework

  

  1. (School of Science, Shenyang University of Technology, Shenyang 110870, China)
  • Received:2018-12-02 Online:2019-08-15 Published:2019-08-16

Abstract: Aiming at the clustering effect and speed of K-means algorithm in processing massive data, a distributed parallel programming model of K-means clustering algorithm based on MapReduce framework is proposed. First, for the sensitive initialization problem of K-means clustering algorithm, a new dissimilarity function is given, according to the degree of dissimilarity between data, k value is determined, and the point with smaller dissimilarity is selected as the initial clustering center, then the K-means algorithm is deployed on the MapReduce programming model, K-means algorithm speeds up to deal with massive data by improving MapReduce programming model. Experiments show that both accuracy and convergence time of the improved K-means algorithm under MapReduce are improved compared with the traditional K-means algorithm, and the parallel clustering model has good expansivity in different data scales and the number of calculated nodes.

Key words: K-means algorithm, dissimilarity function, MapReduce model

CLC Number: