计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

一种针对偏标记的加权k近邻分类方法

  

  1. 南京理工大学计算机科学与工程学院,江苏南京210094
  • 收稿日期:2015-07-24 出版日期:2015-12-23 发布日期:2015-12-30
  • 作者简介:梁伟超(1991-),男,江苏南京人,南京理工大学计算机科学与工程学院硕士研究生,研究方向:机器学习,数据挖掘; 宋斌(1968-),男,副教授,硕士,研究方向:数据挖掘 ,Web信息处理。

A Weighted kNN Classification Method for Partial Labeling

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Received:2015-07-24 Online:2015-12-23 Published:2015-12-30

摘要:

偏标记学习不同于传统的监督学习,它是一种重要的弱监督学习框架。在该框架下,一个示例与一组候选标记相关联,其中只有一个标记是该示例的真实标记。k近邻算法是一种简单且高效的分类
算法。本文提出一种针对偏标记的加权k近邻分类方法。对于给定的一个未见示例,该方法首先在训练集中寻找与未见示例距离最近的k个样本,然后通过求解一个二次规划问题来获得各个近邻样本的权
值,最后采用多数表决原则决定未见示例的标记。实验结果表明,该方法可以有效地提升学习系统的泛化性能。

关键词: 机器学习, 数据挖掘, 偏标记学习, k近邻, 权值估计

Abstract:

As one of the important weaklysupervised machine learning frameworks, partial label learning is different from traditional supervised learning. Under this framework,
an instance might be associated with a set of candidate labels among which only one is valid. The knearest neighbor method is simple but effective for classification. In this
paper, we propose a weighted kNN partial labeling classification method. Firstly, for an unseen instance, it will try to find k nearest neighbors of the unseen instance in
training set. Secondly, the weight of every nearest neighbor is determined by solving a quadratic programming problem. Lastly, the label of the unseen instance is decided in
accordance with the principle of decision by majority. Extensive experiments show that the proposed method can effectively improve the generalization performance of the learning
system.

Key words: machine learning, data mining, partial label learning, k-nearest neighbor, weight estimation

中图分类号: