基于优化初始聚类中心的K中心点算法

doi:10.3969/j.issn.1006-2475.2019.04.001

计算机与现代化 ›› 2019, Vol. 0 ›› Issue (04): 1-.doi: 10.3969/j.issn.1006-2475.2019.04.001

• 算法设计与分析 • 下一篇

基于优化初始聚类中心的K中心点算法

（1.广东松山职业技术学院计算机系，广东韶关512126；2.广东松山职业技术学院电气工程系，广东韶关512126）

收稿日期:2018-09-27 出版日期:2019-04-26 发布日期:2019-04-30
作者简介:段桂芹（1979-），女，吉林公主岭人，讲师，硕士，研究方向：数据挖掘，E-mail: 190306077@qq.com；邹臣嵩（1980-），男，讲师，硕士，研究方向：数据挖掘，网络安全；刘锋(1987-)，男，硕士，研究方向：大数据与云计算。
基金资助:
广东高校省级重大科研项目(2017GkQNCX033)；韶关市科技计划项目(2017CX/K055)；广东松山职业技术学院重点科技项目(2018KJZD001)；广东大学生科技创新培养专项资金资助项目(pdjh2015a0715)

An Improved K-medoids Algorithm Based on Optimal Initial Cluster Center

(1. Department of Computer Science, Guangdong Songshan Polytechnic College, Shaoguan 512126, China;
2. Department of Electrical Engineering, Guangdong Songshan Polytechnic College, Shaoguan 512126, China)

Received:2018-09-27 Online:2019-04-26 Published:2019-04-30

摘要/Abstract

摘要： 针对K中心点算法的初始聚类中心可能过于临近、代表性不足、稳定性差等问题，提出一种改进的K中心点算法。将样本集间的平均距离与样本间的平均距离的比值作为样本的密度参数，精简了高密度点集合中候选代表点的数量，采用最大距离乘积法选择密度较大且距离较远的K个样本作为初始聚类中心，兼顾聚类中心的代表性和分散性。在UCI数据集上的实验结果表明，与传统K中心点算法和其他2种改进聚类算法相比，新提出的算法不仅聚类结果更加准确，同时也具有更快的收敛速度和更高的稳定性。

关键词: 密度； , 初始聚类中心； , K中心点； , 绝对误差

Abstract: Aiming at the initial clustering center of k-medoids may be too near, under-represented, or poor stability, an improved k-medoids algorithm is proposed. The ratio of sample sets average distance and samples average distance is treated as the density of sample parameters, the number of candidate representative points in the high density point set is simplified, the product of maximum distance method is adopted to choose K samples with high density and long distance as the initial clustering center, both of the representative and dispersion of the clustering center are considered also. Experimental results on the UCI data set show that compared with the traditional K-medoids algorithm and the other two improved clustering algorithms, the new algorithm not only has more accurate clustering results, but also has faster convergence speed and higher stability.

Key words: density, initial cluster center, K-medoids, absolute error

中图分类号:

TP301.6

段桂芹1，邹臣嵩2，刘锋2. 基于优化初始聚类中心的K中心点算法[J]. 计算机与现代化, 2019, 0(04): 1-.

DUAN Gui-qin1， ZOU Chen-song2， LIU Feng2. An Improved K-medoids Algorithm Based on Optimal Initial Cluster Center[J]. Computer and Modernization, 2019, 0(04): 1-.

参考文献

［1］邹臣嵩,杨宇. 基于最大距离积与最小距离和协同K聚类算法［J］. 计算机应用与软件, 2018,35(5):297-301.
［2］蒋丽,薛善良. 基于改进K-means算法的文本聚类［J］. 计算机与现代化， 2018(4):17-21.
［3］陈小雪,尉永清,任敏,等. 基于萤火虫优化的加权K-means算法［J］. 计算机应用研究， 2018,35(2):466-470.
［4］谢娟英,郭文娟,谢维信. 基于邻域的K中心点聚类算法［J］. 陕西师范大学学报(自然科学版)， 2012,40(4):16-22.
［5］ MACQUEEN J. Some methods for classification and analysis of mulitivariate observations［C］// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967:281-297.
［6］ THEODORIDIS S， KOUTROUMBAS K. Pattern Recognition［M］. Boston: Academic Press, 2009:745-748.
［7］ KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis［M］. New York: Wiley, 1990:126-163.
［8］ LUCASIUS C B, DANE A D, Kateman G. On K-medoid clustering of large data sets with the aid of a genetic algorithm: Background, feasibility and comparison［J］. Analytica Chimica Acta, 1993,282(3):647-669.
［9］ PREZ-ORTEGA J, ALMANZA-ORTEGA N N, ADAMS-LPEZ J, et al. Improving the efficiency of the K-medoids clustering algorithm by getting initial medoids［C］// World Conference on Information Systems and Technologies. Spinger,2017:125-132.
［10］王勇,王李福,饶勤菲,等. 半径自适应的初始中心点选择K-medoids聚类算法［J］. 重庆理工大学学报(自然科学版), 2017(2):95-101.
［11］PARK H S， JUN C H. A simple and fast algorithm for K-medoids clustering［J］. Expert Systems with Applications， 2009,36(2):3336-3341.
［12］熊忠阳,陈若田,张玉芳. 一种有效的K-means聚类中心初始化方法［J］. 计算机应用研究, 2011,28(11):4188-4190.
［13］琚书存,程文杰,徐建鹏,等. 基于密度峰和划分的快速聚类算法［J］. 计算机与现代化, 2018(8):16-20.
［14］郝美薇,戴华林,郝琨. 基于密度的K-means算法在轨迹数据聚类中的优化［J］. 计算机应用, 2017,37(10):2946-2951.
［15］闫安,刘琪林. 一种基于参考点的快速密度聚类算法［J］. 微电子学与计算机, 2017,34(10):32-35.
［16］韩东红,宋明,张宏亮,等. 基于密度的不确定数据流聚类算法［J］. 清华大学学报(自然科学版), 2017,57(8):884-891.
［17］孙昊,张明新,戴娇,等. 基于网格的快速搜寻密度峰值的聚类算法优化研究［J］. 计算机工程与科学, 2017,39(5):964-970.
［18］何熊熊,管俊轶,叶宣佐,等. 一种基于密度和网格的簇心可确定聚类算法［J］. 控制与决策, 2017,32(5):913-919.
［19］许合利,牛丽君. 基于层次与密度的任意形状聚类算法［J］. 计算机工程, 2016,42(7):159-164.
［20］张惟皎,刘春煌,李芳玉. 聚类质量的评价方法［J］. 计算机工程, 2005,31(20):10-12.
［21］卞彩峰,邱建林,陈燕云,等. 基于粒计算的k值选取及其应用［J］. 计算机工程与设计, 2015,36(11):3082-3086.
［22］HUBERT L, ARABIE P. Comparing partitions［J］. Journal of Classification, 1985,2(1):193-218.
［23］GREEN P E, KIM J, CARMONE F J. A preliminary study of optimal variable weighting in k-means clustering［J］. Journal of Classification, 1990,7(2):271-285.
［24］谢娟英,周颖,王明钊,等. 聚类有效性评价新指标［J］. 智能系统学报, 2017,12(6):873-882.

基于优化初始聚类中心的K中心点算法

An Improved K-medoids Algorithm Based on Optimal Initial Cluster Center

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics

本文评价