计算机与现代化

• 算法设计与分析 •    下一篇

基于优化初始聚类中心的K中心点算法

  

  1. (1.广东松山职业技术学院计算机系,广东韶关512126;2.广东松山职业技术学院电气工程系,广东韶关512126)
  • 收稿日期:2018-09-27 出版日期:2019-04-26 发布日期:2019-04-30
  • 作者简介:段桂芹(1979-),女,吉林公主岭人,讲师,硕士,研究方向:数据挖掘,E-mail: 190306077@qq.com; 邹臣嵩(1980-),男,讲师,硕士,研究方向:数据挖掘,网络安全; 刘锋(1987-),男,硕士,研究方向:大数据与云计算。
  • 基金资助:
    广东高校省级重大科研项目(2017GkQNCX033); 韶关市科技计划项目(2017CX/K055); 广东松山职业技术学院重点科技项目(2018KJZD001); 广东大学生科技创新培养专项资金资助项目(pdjh2015a0715)

An Improved K-medoids Algorithm Based on Optimal Initial Cluster Center

  1. (1. Department of Computer Science, Guangdong Songshan Polytechnic College, Shaoguan 512126, China;
    2. Department of Electrical Engineering, Guangdong Songshan Polytechnic College, Shaoguan 512126, China)
  • Received:2018-09-27 Online:2019-04-26 Published:2019-04-30

摘要: 针对K中心点算法的初始聚类中心可能过于临近、代表性不足、稳定性差等问题,提出一种改进的K中心点算法。将样本集间的平均距离与样本间的平均距离的比值作为样本的密度参数,精简了高密度点集合中候选代表点的数量,采用最大距离乘积法选择密度较大且距离较远的K个样本作为初始聚类中心,兼顾聚类中心的代表性和分散性。在UCI数据集上的实验结果表明,与传统K中心点算法和其他2种改进聚类算法相比,新提出的算法不仅聚类结果更加准确,同时也具有更快的收敛速度和更高的稳定性。

关键词: 密度; , 初始聚类中心; , K中心点; , 绝对误差

Abstract: Aiming at the initial clustering center of k-medoids may be too near, under-represented, or poor stability, an improved k-medoids algorithm is proposed. The ratio of sample sets average distance and samples average distance is treated as the density of sample parameters, the number of candidate representative points in the high density point set is simplified, the product of maximum distance method is adopted to choose K samples with high density and long distance as the initial clustering center, both of the representative and dispersion of the clustering center are considered also. Experimental results on the UCI data set show that compared with the traditional K-medoids algorithm and the other two improved clustering algorithms, the new algorithm not only has more accurate clustering results, but also has faster convergence speed and higher stability.

Key words: density, initial cluster center, K-medoids, absolute error

中图分类号: