Computer and Modernization

Previous Articles     Next Articles

 A Parallel Clustering Model Based on MapReduce

  

  1. 1. Key Laboratory of Integrated Exploitation of Bayan Obo Multi-Metal Resources, IMUST, Baotou 014010, China;

     2. Information Center, IMUST, Baotou 014010, China
  • Received:2013-08-30 Online:2014-01-20 Published:2014-02-10

Abstract: During the clustering for big scale data, the traditional serial model has limitations, can not obtain a satisfied results within significant intervals. This paper proposes a concurrent clustering model based on MapReduce architechture under the Hadoop platform. The experiment result shows this model has a perfect linear speedup, and it outperforms the traditional clusting model, especially in dealing with massive data set.

Key words: data mining, algorithm, cloud computing, MapReduce, Hadoop