计算机与现代化 ›› 2022, Vol. 0 ›› Issue (10): 19-23.

• 人工智能 • 上一篇    下一篇

一种基于邻域粒度熵的离群点检测算法

  

  1. (青岛科技大学信息科学与技术学院,山东青岛266061)
  • 出版日期:2022-10-20 发布日期:2022-10-21
  • 作者简介:段珣(1996—),男,山东淄博人,硕士研究生,研究方向:机器学习,数据挖掘,E-mail: duanxfxy@163.com; 杨志勇(1996—),男,山东济宁人,硕士研究生,研究方向:数据挖掘,E-mail: 584421472@qq.com; 通信作者:江峰(1978—),男,江西彭泽人,教授,CCF会员,博士,研究方向:机器学习,数据挖掘,E-mail: jiangfeng@qust.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(61973180, 61671261); 山东省自然科学基金资助项目(ZR2018MF007)

An Outlier Detection Algorithm Based on Neighborhood Granular Entropy

  1. (College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China)
  • Online:2022-10-20 Published:2022-10-21

摘要: 离群点检测是数据挖掘领域的重要研究方向之一,其目的是找出数据集中与其他数据对象显著不同的一小部分数据。离群点检测在网络入侵检测、信用卡欺诈检测、医疗诊断等领域有着非常重要的应用。近年来,粗糙集理论被广泛用于离群点检测,然而,经典的粗糙集模型不能有效处理数值型数据。对此,本文利用邻域粗糙集模型来检测离群点,在邻域粗糙集中引入一种新的信息熵模型——邻域粒度熵。基于邻域粒度熵,提出一种新的离群点检测算法OD_NGE。实验结果表明,相对于已有的离群点检测算法,OD_NGE具有更好的离群点检测性能。

关键词: 离群点检测, 邻域粗糙集, 知识粒度, 邻域粒度熵, 数值型数据

Abstract: Outlier detection is one of the important research directions in the field of data mining. Its purpose is to find out a small portion of data in the data set that is significantly different from other data objects. Outlier detection has very important applications in the fields of network intrusion detection, credit card fraud detection, medical diagnosis and so on. Recently, rough set theory has been widely used in outlier detection. However, the classical rough set model can not effectively deal with the numerical and mixed data. Therefore, in this paper we employ the neighborhood rough set model to detect outliers, and introduce a new information entropy model——neighborhood granular entropy in neighborhood rough sets. Based on the neighborhood granularity entropy, a new outlier detection algorithm called OD_NGE is proposed. Experimental results show that OD_NGE has better outlier detection performance than the existing algorithms.

Key words: outlier detection, neighborhood rough set, knowledge granularity, neighborhood granular entropy, numeric data