计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于密度峰和划分的快速聚类算法

  

  1. (1.安徽省农村综合经济信息中心,安徽合肥230001;2.安徽省农业气象中心,安徽合肥230001)
  • 收稿日期:2018-06-25 出版日期:2018-09-11 发布日期:2018-09-11
  • 作者简介:琚书存(1971-),男,安徽桐城人,安徽省农村综合经济信息中心、安徽省农业气象中心高级工程师,硕士,研究方向:农业农村信息化,数据挖掘; 程文杰(1978-),男,工程师,本科,研究方向:农村信息化; 徐建鹏(1977-),男,高级工程师,本科,研究方向:数据挖掘。
  • 基金资助:
    国家科技支撑计划项目(2014BAD10B05-02); 国家星火计划项目(2014GA710001); 安徽省科技攻关项目(1804A07020124)

A Fast Clustering Algorithm Based on Cluster-centers and Partition

  1. (1. Rural Comprehensive Economic Information Center of Anhui Province, Hefei 230001, China;
     2. Anhui Agrometeorological Center, Hefei 230001, China)
  • Received:2018-06-25 Online:2018-09-11 Published:2018-09-11

摘要: 传统基于划分的聚类算法需要人工给定聚类数,且由于算法采取刚性划分,可能会导致将较大或延伸状的聚类簇分割的现象,导致错误的聚类结果。密度峰聚类是近年提出的一种新的基于密度的聚类算法,该算法不需要预先指定聚类数目,且能够发现非球形簇。将密度峰思想引入基于划分的聚类算法,提出一种基于密度峰和划分的快速聚类算法(DDBSCAN),该算法首先获取一组簇的核心对象(密度峰),用于描述簇的“骨骼”,而后将周围的点划分到最近的核心对象,最后通过判断划分边界处的密度情况合并簇。实验证明,该算法能有效地适应任意形状、大小不一的数据集,与传统基于密度的聚类算法相比收敛速度更快。

关键词: 密度峰聚类, 核心对象, 基于划分, 边界密度, 任意形状

Abstract: The clustering algorithm based on traditional partition needs to give the number of clustering artificially, and due to the rigid partition of the algorithm, it may lead to the segmentation of large or extended clusters, leading to the wrong clustering results. Clustering by density peak is a new clustering algorithm based on density proposed in recent years. The algorithm does not need to specify the number of clusters in advance, and can detect nonspherical clusters. A fast clustering algorithm based on density peak and partition (DDBSCAN) is proposed in this paper. The algorithm first obtains the cluster center (density peak) of a group of clusters, which describes the “skeleton” of the cluster, then divides the surrounding points into the nearest core object, and finally the clusters is merged by judging the density at the dividing edge. Experiments show that the algorithm can effectively adapt to data sets of arbitrary shape and size, and converges faster than traditional clustering algorithms based on density.

Key words: clustering by density peak, cluster center, partition-based, boundary density, irregular shape

中图分类号: