计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

 一种基于改进KNN的大数据离群点检测算法

  

  1. 南方电网科学研究院有限责任公司智能电网研究所,广东广州510080
  • 收稿日期:2016-09-22 出版日期:2017-05-26 发布日期:2017-05-31
  • 作者简介: 黄建理(1990-),男,广东广州人,南方电网科学研究院有限责任公司智能电网研究所工程师,IEEE会员,硕士,研究方向:智能电网信息通信技术,电力信息安全; 杜金燃(1988-),男,广东广州人,硕士,研究方向:电力系统信息通信,信息安全技术。
  • 基金资助:
     四川省科技厅科技支撑计划项目(2013GZ0141)

An Outlier Detection Algorithm in Big Data Based on Improved KNN

  1. Smart Grid Research Institute, Electric Power Research Institute of China Southern Power Grid Co. Ltd, Guangzhou 510080, China
  • Received:2016-09-22 Online:2017-05-26 Published:2017-05-31

摘要: 针对KNN算法在大数据离群点检测领域中难以处理高维数据和时间复杂度过高的这2个缺点,提出一种基于AOR(属性重叠率)的分类方法,并对KNN算法进行改进。首先对数据进行基于AOR的降维处理,使得数据可处理维度大大增加,然后对传统的KNN算法进行剪枝改进,减少了大量的无效计算。实验结果表明,本文算法对维度高、容量大的大数据样本在运行效率、准确度等方面有较大的提升。

关键词: 大数据, KNN, 降维, 属性重叠率, 剪枝

Abstract: Aiming at the two shortcomings of KNN algorithm in the field of large data outlier detection, high dimension data is difficult to deal with and time complexity is too high. A classification method based on AOR (Attribute Overlapping Rate) is proposed, and the KNN algorithm is improved. At first the data were reduced the dimension based on AOR, making data processing dimension great increase. Then the traditional KNN algorithm was improved by pruning, reducing lots of invalid computation. The experimental results show that this algorithm has a great improvement on the operational efficiency and accuracy of the large data samples with high dimension and large capacity.

Key words:  , big data; KNN; reduce dimension; attribute overlapping rate; pruning