计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于谱聚类的全局中心快速更新聚类算法

  

  1. (1.广东松山职业技术学院电气工程系,广东韶关512126;2.广东松山职业技术学院机械工程系,广东韶关512126)
  • 收稿日期:2018-03-23 出版日期:2018-10-26 发布日期:2018-10-26
  • 作者简介:邹臣蒿(1980-),男,吉林白山人,广东松山职业技术学院电气工程系讲师,硕士,研究方向:数据挖掘,网络安全; 刘松(1982-),男,广东松山职业技术学院机械工程系讲师,硕士,研究方向:机械设计,机器学习。
  • 基金资助:
    广东省科技厅科技发展专项资金项目(2017A070712006); 韶关市科技计划项目(2017CX/K055); 广东大学生科技创新培养专项资金项目(pdjh2015a0715)

Global Center Fast Update Clustering Algorithm Based on Spectral Clustering

  1. (1. Department of Electrical Engineering, Guangdong Songshan Polytechnic College, Shaoguan 512126, China;
    2. Department of Mechanical Engineering, Guangdong Songshan Polytechnic College, Shaoguan 512126, China)
  • Received:2018-03-23 Online:2018-10-26 Published:2018-10-26

摘要: 针对高维数据在聚类过程中存在迭代次数多、运算耗时长等问题,提出一种改进的聚类算法,首先采用谱聚类对样本降维,再选取k个首尾相连且距离乘积最大的数据对象作为初始聚类中心,在簇中心更新过程中,选取与簇均值距离最近的数据对象作为簇中心,并将其他数据对象按最小距离划分至相应簇中,反复迭代,直至收敛。实验结果表明,新算法的Rand指数、Jaccard系数和Adjusted Rand Index等聚类指标全部优于K-means算法及其他3种改进聚类算法,在运行效率方面,新算法的聚类耗时更短、迭代次数更少。

关键词: 全局中心, 均值最近点, 谱聚类, 聚类评价指标, 聚类算法

Abstract: Aiming at the problems of high iteration number and long computation time in the clustering process of high dimensional data, an improved clustering algorithm is proposed. The algorithm first uses spectral clustering to reduce the dimension of samples, then selects k data objects with the end to end and the largest distance product as the initial clustering center, in the update process of cluster centers, selects data objects nearest to cluster mean as cluster centers. And other data objects are divided into corresponding clusters according to the minimum distance, iterated iteratively until convergence. The experimental results show that the Rand index, Jaccard coefficient and Adjusted Rand Index of  the new algorithm are better than K-means algorithm and other 3 kinds of improved clustering algorithms. In terms of operational efficiency, the new algorithm has shorter clustering time and fewer iterations.

Key words: global center, mean nearest point, spectral clustering, clustering evaluation index, clustering algorithm

中图分类号: