计算机与现代化 ›› 2014, Vol. 0 ›› Issue (1): 93-95,113.

• 算法设计与分析 • 上一篇    下一篇

 一种基于k-均值聚类的异常检测技术

  

  1. 山西警官高等专科学校计算机科学与技术系,山西太原030021
  • 收稿日期:2013-08-13 出版日期:2014-01-20 发布日期:2014-02-10
  • 作者简介: 白宁(1975-),男,山西晋城人,山西警官高等专科学校计算机科学与技术系讲师,硕士,研究方向:智能信息处理,计算机软件。

An Outlier Detection Method Based on k-means Clustering

  1. Department of Computer Science and Technology, Shanxi Police Academy, Taiyuan 030021, China
  • Received:2013-08-13 Online:2014-01-20 Published:2014-02-10

摘要: 由于实际问题中用户的行为模式存在多样性和不可预知性,传统异常检测方法采用提前设定正常模式或异常模式进行学习变得非常困难。针对这个问题,本文提出一种基于k-均值聚类的自适应异常检测方法,称为OD_KC方法。该方法设置不同的聚类个数对无标签的样本集进行k-均值聚类,通过构造测度函数,以衡量聚类结果的抱团性和分离性,从而获得最佳的聚类结果,同时自动得到那些被划分为很小规模的类的样本作为异常模式样本。基于k-均值的异常检测方法具有很强的自主性和自适应性,特别地,当样本分布模式复杂时,也能得到较为优秀的检测结果,具有较好的异常检测能力。实验结果表明,基于k-均值聚类的异常检测技术能够得到较好的检测结果。

关键词: 数据挖掘, 聚类结果, 测度函数, 自适应性, OD_KC方法

Abstract: Because the behavior pattern of users are always diverse and unpredictability, the traditional outlier detection methods using normal or abnormal models former setting become a difficult problem. To solve this problem, this paper presents a self-adapting outlier detection method based on k-means clustering, called OD_KC algorithm. Based on the unlabeled training samples are clustered by k-means method by different clustering parameter, a measure function is constructed to measure the performance of clustering process to obtain the optimal clustering results, and the small size classes after clustering are took as the outlier model. The outlier detection method based on k-means clustering has the autonomy and adaptability. Specially, the good results also can be obtained when the training data distribution is difficult by the OD_KC method and it has good outlier detection ability. Simulation results on standard datasets demonstrate that excellent detection results can be obtained by this method.

Key words: DM, clustering results, measure function, adaptation, OD_KC algorithm