基于相似度计算公式改进的K-中心点算法

doi:10.3969/j.issn.1006-2475.2019.05.021

计算机与现代化 ›› 2019, Vol. 0 ›› Issue (05): 113-.doi: 10.3969/j.issn.1006-2475.2019.05.021

基于相似度计算公式改进的K-中心点算法

（齐鲁工业大学(山东省科学院)计算机科学与技术学院，山东济南250353）

收稿日期:2018-11-28 出版日期:2019-05-14 发布日期:2019-05-14
作者简介:韩冰(1993-),女,山东莱芜人,硕士研究生,研究方向:数据挖掘,E-mail: binghan93@126.com; 姜合(1964-),男,教授, 硕士,研究方向:数据挖掘,数据库,数据仓库。
基金资助:
山东省自然科学基金资助项目(ZR2012FM032)

Improved K-medoids Algorithm Based on Similarity Calculation Formula

(School of Computer Science and Technology, Qilu University of Technology(Shandong Academy of Sciences), Jinan 250353, China)

Received:2018-11-28 Online:2019-05-14 Published:2019-05-14

摘要/Abstract

摘要： 在传统K-中心点聚类算法中，相似性一般仅仅用距离来进行度量，这种度量方法均基于对象属性之间是独立同分布的，但大多数真实数据对象属性之间都相关联的，因此，本文将引用非独立同分布计算公式，对传统距离计算相似度方法进行替换。同时，由于此公式会依据属性值的频率来进行计算，但数值型数据对于频率并不敏感，因此，本文在引入公式之前，将数值型数据按属性列进行聚类与替换。实验结果表明，本文方法可以提高算法的聚类精度。

关键词: 聚类, PAM算法, 相似度

Abstract: In the traditional K-medoids clustering algorithm, similarity is generally measured only by distance. This metric is based on independent and identically distributed attributes of data objects. But most real data object attributes are associated. Therefore, this article introduces the non-independent and identical distribution calculation formula. The traditional distance calculation similarity method is replaced. At the same time, since the non-independent and identical distribution formulas are calculated according to the frequency of the attribute values, but numerical data are not sensitive to frequency, so, numerical data are clustered and replaced by attribute columns before the introduction of formulas. Experimental results show that this method can improve the clustering accuracy of algorithm.

Key words: clustering, PAM algorithm, similarity

中图分类号:

TP301.6

韩冰，姜合. 基于相似度计算公式改进的K-中心点算法[J]. 计算机与现代化, 2019, 0(05): 113-.

HAN Bing， JIANG He. Improved K-medoids Algorithm Based on Similarity Calculation Formula[J]. Computer and Modernization, 2019, 0(05): 113-.

参考文献

［1］ HAN J, KAMBER M, TUNG A K H. Spatial clustering methods in data mining: A survey［M］// Geographic Data Mining and Knowledge Discovery. Taylor and Francis Inc., 2001:188-217.
［2］田盛丰. 聚类分析方法［J］. 计算机研究与发展, 1992(3):16-20.
［3］周恩波,毛善君,李梅,等. GPU加速的改进PAM聚类算法研究与应用［J］. 地球信息科学学报, 2017,19(6):782-791.
［4］陈志强,刘钊,张建辉. 聚类分析中PAM算法的分析与实现［J］. 计算机与现代化, 2003(9):1-3.
［5］王世卿,张书春. 基于网格结构的快速PAM算法［J］. 计算机工程与设计, 2011,32(3):952-954.
［6］ KAUFMAN L, ROUSSEEUW P J. Finding Groups in Data: An Introduction to Cluster Analysis［M］. John Wiley & Sons, Inc., 1990.
［7］ CHU S C, RODDICK J F, PAN J S. An efficient K-medoids-based algorithm using previous medoid index, triangular inequality elimination criteria, and partial distance search［C］// International Conference on Data Warehousing and Knowledge Discovery. 2002:63-72.
［8］吴景岚,朱文兴. 基于k中心点的迭代局部搜索聚类算法［C］// 第21届中国数据库学术会议论文集 (研究报告篇). 2004:503-509.
［9］余建桥,张帆. 基于数据场改进的PAM聚类算法［J］. 计算机科学, 2005,32(1):165-167.
［10］PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering［J］. Expert Systems with Applications, 2009,36(2):3336-3341.
［11］PARDESHI B, TOSHNIWAL D. Improved k-medoids clustering based on cluster validity index and object density［C］// 2010 IEEE 2nd International Advance Computing Conference(IACC). 2010:379-384.
［12］谢娟英,郭文娟,谢维信. 基于邻域的K中心点聚类算法［J］. 陕西师范大学学报(自然科学版), 2012,40(4):16-22.
［13］谢娟英,高瑞. 方差优化初始中心的K-medoids聚类算法［J］. 计算机科学与探索, 2015,9(8):973-984.
［14］谢娟英,屈亚楠. 密度峰值优化初始中心的K-medoids聚类算法［J］. 计算机科学与探索, 2016,10(2):230-247.
［15］路浩,倪世宏,查翔,等. 基于递减概率初始点选择K中心点进化算法［J］. 计算机仿真, 2014,31(9):314-318.
［16］颜宏文,周雅梅,潘楚. 基于宽度优先搜索的K-medoids聚类算法［J］. 计算机应用, 2015,35(5):1302-1305.
［17］罗德超，吴文亮，姬应江,等. 一种基于K均值预处理回溯的PAM算法［J］. 软件, 2011,32(4):95-99.
［18］杨志,罗可. 一种改进的基于粒子群的粗糙K-medoids算法［J］. 计算机工程与应用, 2014,50(20):110-114.
［19］何云斌,张志超,万静,等. 不确定数据聚类的U-PAM算法和UM-PAM算法的研究［J］. 计算机科学, 2016,43(6):263-269.
［20］余冬华,郭茂祖,刘扬,等. 基于距离不等式的K-medoids聚类算法［J］. 软件学报, 2017,28(12):3115-3128.
［21］赖向阳,宫秀军,韩来明. 一种MapReduce架构下基于遗传算法的K-Medoids聚类［J］. 计算机科学, 2017,44(3):23-26.
［22］CAO L. Non-IID Learning/Non-IIDness Learning: Learning from Non-IID Data［R］. University of Technology Sydney, Australia, 2011.
［23］WANG C, CAO L, WANG M, et al. Coupled nominal similarity in unsupervised learning［C］// Proceedings of the 20th ACM Conference on Information and Knowledge Management. 2011:973-978.
［24］CAO L. Coupling learning of complex interactions［J］. Information Processing & Management, 2015,51(2):167-186.
［25］赵湘民,陈曦,潘楚. 基于稠密区域的K-medoids聚类算法［J］. 计算机工程与应用, 2016,52(16):85-89.

[1]	吕美静1, 年梅1, 张俊1, 2, 付鲁森1. 基于自编码器的网络流量异常检测[J]. 计算机与现代化, 2024, 0(12): 40-44.
[2]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[3]	刘文亮1, 吴飞1, 何德明1, 赵维伟2, 潘建宏3. 基于相异度矩阵的碎片化回复文本聚类方法[J]. 计算机与现代化, 2024, 0(09): 56-60.
[4]	仁青卓玛1, 2, 3, 拥措1, 2, 3, 唐超超1, 2, 3. 面向藏汉神经机器翻译的数据筛选方法[J]. 计算机与现代化, 2024, 0(06): 19-24.
[5]	袁红伟1, 常利军1, 郝家欢2, 樊娜2, 王超2, 罗闯2, 张泽辉2. 基于标签传播的轨迹兴趣点挖掘及隐私保护[J]. 计算机与现代化, 2024, 0(05): 46-54.
[6]	敖博超, 范冰冰. 基于AP聚类算法的联邦学习聚合算法[J]. 计算机与现代化, 2024, 0(04): 5-11.
[7]	孟雅蕾1, 师红宇1, 王予2. 一种无阻流量预测方法[J]. 计算机与现代化, 2024, 0(04): 33-37.
[8]	曾钟静昕, 甘刚. 基于卷积自编码器的侧信道分析[J]. 计算机与现代化, 2024, 0(03): 110-114.
[9]	王秋忆, 周浩, 郑婷婷. 改进RetinaNet的电力设备目标检测方法[J]. 计算机与现代化, 2024, 0(01): 47-52.
[10]	王宏杰, 徐胜超. 基于希尔伯特相似度的云平台异常传输数据聚类方法[J]. 计算机与现代化, 2023, 0(09): 27-31.
[11]	韩雪. 基于约束聚类和粒子群算法的多路径规划[J]. 计算机与现代化, 2023, 0(08): 7-11.
[12]	孙子雨, 任燃, 魏曦哲. 基于DTW-TCN的股票分类及预测研究[J]. 计算机与现代化, 2023, 0(08): 31-37.
[13]	王鸿, 葛红. 基于注意力机制和语义相似度的跨模态哈希检索[J]. 计算机与现代化, 2023, 0(08): 44-53.
[14]	王艺成, 张国良, 张自杰, . 基于改进YOLOv5的小目标检测方法[J]. 计算机与现代化, 2023, 0(05): 100-105.
[15]	马瑜涓, 韩建宁, 史韶杰, 曹尚斌, 杨志秀. 基于HMRF的改进Kmeans脑肿瘤分割算法[J]. 计算机与现代化, 2023, 0(03): 1-5.

基于相似度计算公式改进的K-中心点算法

Improved K-medoids Algorithm Based on Similarity Calculation Formula

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价