计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于传递距离的谱聚类算法

  

  1. (1.中国电信股份有限公司扬州分公司,江苏扬州225002;2.扬州大学信息工程学院,江苏扬州225127)
  • 收稿日期:2018-05-22 出版日期:2019-01-03 发布日期:2019-01-04
  • 作者简介:戴天辰(1993-),男,江苏扬州人,中国电信股份有限公司扬州分公司员工,硕士,研究方向:模式识别; 顾正弘(1985-),男,江苏泰州人,扬州大学信息工程学院讲师,博士,研究方向:模式识别。

Spectral Clustering Algorithm Based on Transitive Distance

  1. (1. Yangzhou Branch, China Telecom, Yangzhou 225002, China;
    2.College of Information Engineering, Yangzhou University, Yangzhou 225127, China)
  • Received:2018-05-22 Online:2019-01-03 Published:2019-01-04

摘要: 谱聚类算法受到度量中尺度因子的影响,同时传统谱聚类算法通过欧氏距离度量样本间相似性也不准确。针对上述问题,提出一种基于传递距离的谱聚类算法。算法首先通过改进传统谱聚类中的度量方式,用基于传递距离的度量方式度量样本间相似性,并构建传递矩阵,接着用传递矩阵做相似度变换构建拉普拉斯矩阵,最终通过求特征值和特征向量完成聚类。基于传递距离的谱聚类算法在人工数据集及UCI数据集上均取得了良好的聚类结果,具有较好的鲁棒性和有效性。

关键词: 谱聚类, 尺度因子, 传递距离, 传递矩阵

Abstract: Spectral clustering algorithms are influenced by the mesoscale factors of metrics, and similarity measured by Euclidean distance is not always accurate. In view of this situation, a spectral clustering algorithm based on transitive distance is proposed. The main idea contains three steps. First, a minimum spanning tree is constructed to do a similarity transformation, as a result a transfer matrix is generated. Second, we construct a Laplacian matrix by the transfer matrix of the first step. Data is projected into the eigen-space of this Laplacian matrix. Lastly, the clustering in the space of the second step is done. The experimental results on artificial data sets and UCI data sets show that the spectral clustering algorithm based on the transitive distance has good robustness and effectiveness.

Key words: spectral clustering, mesoscale factors, transitive distance, transitive matrix

中图分类号: