计算机与现代化 ›› 2015, Vol. 0 ›› Issue (11): 1-5+11.doi: 10.3969/j.issn.1006-2475.2015.11.001

• 数据库与数据挖掘 •    下一篇

一种基于k维树的模糊C均值聚类算法

  

  1. (南京航空航天大学计算机科学与技术学院,江苏南京210016)
  • 收稿日期:2015-08-03 出版日期:2015-11-12 发布日期:2015-11-16
  • 作者简介:吴非(1991-),男,安徽池州人,南京航空航天大学计算机科学与技术学院硕士研究生,研究方向:数据库,数据挖掘; 毛宇光(1962-),男,副教授,博士,研究方向:数据库系统及理论,数据挖掘与数据仓库,多值逻辑。
  • 基金资助:
    2014年中央高校基本科研业务费专项资金资助项目; 南京航空航天大学研究生创新基地(实验室)开放基金资助项目(kfjj201460)

A Novel Fuzzy C-Means Algorithm Based on k-d Tree

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China)
  • Received:2015-08-03 Online:2015-11-12 Published:2015-11-16

摘要: 初始聚类中心的选择极大地影响了模糊C均值聚类算法的性能,一个好的初始聚类中心能显著加快算法的收敛速度和减少算法的运行时间。本文提出一种新的基于k维树的模糊C均值聚类算法。通过使用k维树的方法分割原始数据集得到多个网格,并选取网格的加权中心作为新的数据点构成一个简化的数据集,在此基础上可快速查找一组距离实际聚类中心较近的初始聚类中心,显著减少模糊C聚类算法的迭代次数。通过在16个人工数据集和一组真实图像数据上的实验结果表明,数据集的数据量较大时,在不损失聚类精确度的情况下,本算法相对于普通的模糊C均值聚类算法,收敛速度提升了近2倍,算法的运行时间也缩短到经典FCM算法的一半以下。

关键词: 模糊C均值聚类算法, k维树, 初始聚类中心, 无监督学习

Abstract: Abstract: The performance of the Fuzzy C-Means(FCM) algorithm largely depends on the selection of the initial cluster center. A set of good initial cluster centers which are close to the actual final cluster center will not only speed up the rate of convergence but also reduce the overall processing time of the FCM dramatically. In this paper, we propose a novel FCM algorithm based on k-d tree which is a kind of multidimensional binary tree. The original data sets are split into several grids by using k-d tree. Then we form a new simplified data set by using the weighted centers of the grids, and based on which we find a good initial cluster center that is close to actual final cluster center so that reduce the iterations of FCM signally. The experiments on sixteen artificial datasets and real-life image data show that comparing with original FCM algorithm, the convergence rate of the proposed algorithm is advanced nearly twice as FCM algorithm, and the overall processing time reduce to less than half of the FCM.

Key words: Fuzzy C-Means cluster algorithm, k-d tree, initial cluster center, unsupervised learning

中图分类号: