计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于待测样本标记的加速K-NN分类方法

  

  1. 晋中学院信息技术与工程学院,山西  晋中  030619
  • 收稿日期:2016-12-23 出版日期:2017-09-20 发布日期:2017-09-19
  • 作者简介:王晓(1980-),男,山西晋中人,晋中学院信息技术与工程学院讲师,硕士,研究方向:数据挖掘与智能信息处理; 赵丽(1973-),女,山西榆次人,副教授,硕士,研究方向:人工智能与数据挖掘。

Speeding K-NN Classification Method Based on Testing Sample Label

  1. School of Information Technology & Engineering, Jinzhong University, Jinzhong 030619, China
  • Received:2016-12-23 Online:2017-09-20 Published:2017-09-19

摘要: 针对传统K-NN分类方法预测效率低的问题,提出一种基于待测样本标记的加速K-NN分类(Speeding K-NN Classification Based on Testing Sample Label, KNN_TSL)方法。该方法首先采用传统K-NN分类方法得到一定数量的待测样本类别;然后对于再进入的待测样本,计算其与已标记类别待测样本的距离,如果该距离小于给定的阈值,则将该新进入的样本赋予相同的类别标签,反之则重新分类。这种方法对于后续进入的易分类待测样本,只需要计算其与少数比原始标记样本更有代表性的已标记待测样本的距离即可进行类别决策,而只有少数的待测样本需要重新分类。由于已标记待测样本包含了部分类别信息,因此采用这种方法可以在大大提高分类预测效率的同时保证模型的泛化性能。实验结果表明,本文提出的KNN_TSL方法能够获得较高的样本预测速度和较好的预测准确率。

关键词: K-NN分类, 待测样本标记, KNN_TSL方法

Abstract:  To solve the problem of the low prediction efficiency of traditional K-NN classification, this paper presents a speeding K-Nearest Neighbor (K-NN) classification method based on testing sample label (KNN_TSL). Firstly, a certain number of testing samples is obtained by traditional K-NN classification method. Then for the samples to be entered latterly, the distance between the labeled samples and the testing sample is calculated. If the distance is less than a given threshold, the new entry sample is assigned the same class label. Otherwise, the K-NN classification is performed. By this method, most last easily classified samples can be decided only by considering the relationship of it with the labeled testing samples, and only a small number of samples is reclassified. Because the labeled samples contain some information of class, this method can greatly improve the classification prediction efficiency and ensure the generalization performance. The experiment result demonstrates that the proposed KNN_TSL model can obtain the high learning efficiency and testing accuracy simultaneously.

Key words: K-NN classification, testing sample label, KNN_TSL algorithm

中图分类号: