计算机与现代化 ›› 2013, Vol. 1 ›› Issue (1): 47-52.doi:

• 算法设计与分析 • 上一篇    下一篇

基于直方图和FP增长的高维空间离群点挖掘

李龙姣,程国达
  

  1. 南京财经大学信息工程学院,江苏南京210046
  • 收稿日期:2012-08-31 修回日期:1900-01-01 出版日期:2013-02-06 发布日期:2013-02-06

Mining Outliers in Highdimensional Space Based on Histogram and FPgrowth

LI Longjiao, CHENG Guoda
  

  1. College of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210046, China
  • Received:2012-08-31 Revised:1900-01-01 Online:2013-02-06 Published:2013-02-06

摘要:

高维空间离群点的检测和分析是数据挖掘的研究难点之一,针对现有方法存在的问题,提出基于直方图和FP增长的高维空间离群点的挖掘方法。该方法首先计算每一维上数据点的KNN(KNearest Neighbors)距离,形成直方图,利用直方图判定数据每一维上的离群点,然后用FP增长算法挖掘频繁离群维之间的关联规则,用于解释离群点在离群维之间的关系。实验证明所提方法不仅有效,而且具有实际意义。

关键词: 关键词:数据挖掘, KNN距离, 直方图, FP增长, 离群维关联

Abstract:

Outlier detecting in highdimensional space is one of the difficult issues in the area of data mining. In this paper, a new outlier mining method based on histogram and FP (FrequentPattern) growth is presented to solve the drawback of the existing methods on high dimensional space. In the method, the KNN(KNearest Neighbors)distance is calculated first to form the histogram in each dimension. Then, the outliers are distinguished from them. Finally, the association rules of the frequent outlier dimensions are detected by the FP growth to explain the relation in the outliers. The experiments indicate that the method is effective and meaningful.

Key words: Key words: data mining, KNN distance, histogram, FP growth, association of outlier dimensions