计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于相似度计算公式改进的K-中心点算法

  

  1. (齐鲁工业大学(山东省科学院)计算机科学与技术学院,山东济南250353)
  • 收稿日期:2018-11-28 出版日期:2019-05-14 发布日期:2019-05-14
  • 作者简介:韩冰(1993-),女,山东莱芜人,硕士研究生,研究方向:数据挖掘,E-mail: binghan93@126.com; 姜合(1964-),男,教授, 硕士,研究方向:数据挖掘,数据库,数据仓库。
  • 基金资助:
    山东省自然科学基金资助项目(ZR2012FM032)

Improved K-medoids Algorithm Based on Similarity Calculation Formula

  1. (School of Computer Science and Technology, Qilu University of Technology(Shandong Academy of Sciences), Jinan 250353, China)
  • Received:2018-11-28 Online:2019-05-14 Published:2019-05-14

摘要: 在传统K-中心点聚类算法中,相似性一般仅仅用距离来进行度量,这种度量方法均基于对象属性之间是独立同分布的,但大多数真实数据对象属性之间都相关联的,因此,本文将引用非独立同分布计算公式,对传统距离计算相似度方法进行替换。同时,由于此公式会依据属性值的频率来进行计算,但数值型数据对于频率并不敏感,因此,本文在引入公式之前,将数值型数据按属性列进行聚类与替换。实验结果表明,本文方法可以提高算法的聚类精度。

关键词: 聚类, PAM算法, 相似度

Abstract: In the traditional K-medoids clustering algorithm, similarity is generally measured only by distance. This metric is based on independent and identically distributed attributes of data objects. But most real data object attributes are associated. Therefore, this article introduces the non-independent and identical distribution calculation formula. The traditional distance calculation similarity method is replaced. At the same time, since the non-independent and identical distribution formulas are calculated according to the frequency of the attribute values, but numerical data are not sensitive to frequency, so, numerical data are clustered and replaced by attribute columns before the introduction of formulas. Experimental results show that this method can improve the clustering accuracy of algorithm.

Key words: clustering, PAM algorithm, similarity

中图分类号: