计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于概念与词根双特征互助文本分类模型

  

  1. 重庆大学计算机学院,重庆400030
  • 收稿日期:2015-03-17 出版日期:2015-08-08 发布日期:2015-08-19
  • 作者简介:古平(1976-),男,重庆人,重庆大学计算机学院副教授,博士,研究方向:数据挖掘,机器学习; 吴庭君(1988-),男,安徽安庆人,硕士研究生,研究方向:数据挖掘,机器学习; 文静云(1991-),女,河南信阳人,硕士研究生,研究方向:离群点检测。
  • 基金资助:
    重庆市自然科学基金资助项目(cstc2012jjA40002); 中央高校基本科研基金资助项目(0216005207016)

Text Classification Model Based on Cooperation of Dual Features of Concept and Root

  1. College of Computer, Chongqing University, Chongqing 400030, China
  • Received:2015-03-17 Online:2015-08-08 Published:2015-08-19

摘要: 传统半监督文本分类方法,大多数建立在词根特征的基础上,忽略了语义特征的重要性,导致分类精度不高。考虑到语义对分类的影响,本文提出融合概念与词根双特征的文本分类模型。该方法以WordNet为本体库,在Co-training框架下,构造基于概念和词根的双分类器进行协同训练的分类模型。实验分析了新模型分类准确率和召回率,结果显示新模型相对于旧模型在这2方面都有提升,表明基于概念与词根双特征互助的新算法具有更高的有效性。

关键词: 半监督, 语义, 双特征, 协同训练

Abstract: Traditional semi-supervised text classification methods were built based on the features of root, however, the common disadvantage of neglecting the importance of semantic features resulted in low precision of classification. In order to take account of the influence of semantic on classification, a text classification model comprehensively making use of dual features of concept and root was brought forward. Under the framework of cooperative training, this algorithm considered WordNet as ontology library and built double classifiers based on both concept and root for cooperative training. Through experiments, we analyzed the accuracy rate and recall rate of new classification model, and the results showed the promotions of both accuracy rate and recall rate in new model comparing with old model. It indicates that the new algorithm based on cooperation of dual features of concept and root is more effective than the old algorithm.

Key words:  , semi-supervised; semantic; dual feature; cooperative training

中图分类号: