计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于密度划分的离群点检测算法

  

  1. 1.西北工业大学计算机学院,陕西西安710129;2.西北工业大学理学院,陕西西安710129
  • 收稿日期:2014-12-01 出版日期:2015-03-23 发布日期:2015-03-26
  • 作者简介:魏龙(1983-),男,甘肃庆阳人,西北工业大学计算机学院硕士研究生,研究方向:数据挖掘,人工智能; 王勇(1973-),男,西北工业大学理学院副教授,硕士生导师,博士,研究方向:运筹学,数据挖掘,人工智能。
  • 基金资助:
    西北工业大学基础研究基金资助项目(JC201273)

Outliers Detection Algorithm Based on Density Division

  1. 1. School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China;
    2. School of Natural and Applied Sciences, Northwestern Polytechnical University, Xi’an 710129, China
  • Received:2014-12-01 Online:2015-03-23 Published:2015-03-26

摘要: 目前,大部分离群点检测算法需要人工输入参数,不能同时检测出全局和局部离群点,不能有效处理密度不均匀数据。针对这些问题,提出一种基于密度划分的离群点检测算法DD-DBSCAN。主要创新包括:1)运用最小生成树的方法,新定义簇密度概念,将数据录入后划分成密度不等的簇,使算法能够处理密度分布不均匀的数据;2)采用“分而治之”的思想,对经过划分的数据集分别进行离群点检测,使得算法能够同时处理全局和局部离群点;3)通过在各个簇中自适应地计算所需参数值,算法不再需要人工输入参数(聚类半径(Eps)等)。通过在2D模拟数据集和Iris真实数据集上的实验表明,与DBSCAN算法比较,本文算法具有更高的覆盖率和正确率。

关键词: 数据挖掘, 聚类, 离群点检测

Abstract: Most existing outliers detection algorithms need to input parameters manually, can’t detect the global and local outliers at the same time, and can’t deal with such problems as uneven density data effectively. This paper proposed an outliers detection algorithm DD-DBSCAN based on density division. The main innovation includes: 1) Define a new concept of Cluster Density according to the method of Minimum Spanning Tree, the entered data is divided into many clusters of different density. It can handle the data of uneven distribution density; 2) Adopting the idea of “divide and rule”, detect outliers from the division data respectively, make the algorithm be able to deal with the global and local outliers at the same time; 3) It can calculate the parameter value for each cluster automatically, makes the algorithm needs no longer human input parameters (Clustering Radius (Eps) Etc). Experiments on 2D simulated data sets and Iris real data sets, compared with DBSCAN algorithm, the results show that the proposed algorithm has higher precision and accuracy.

中图分类号: