基于概念与词根双特征互助文本分类模型

doi:10.3969/j.issn.1006-2475.2015.08.019

计算机与现代化 ›› 2015, Vol. 0 ›› Issue (8): 93-97.doi: 10.3969/j.issn.1006-2475.2015.08.019

基于概念与词根双特征互助文本分类模型

重庆大学计算机学院，重庆400030

收稿日期:2015-03-17 出版日期:2015-08-08 发布日期:2015-08-19
作者简介:古平(1976-),男,重庆人,重庆大学计算机学院副教授,博士，研究方向:数据挖掘,机器学习; 吴庭君(1988-),男,安徽安庆人,硕士研究生,研究方向:数据挖掘,机器学习; 文静云(1991-),女,河南信阳人,硕士研究生,研究方向:离群点检测。
基金资助:
重庆市自然科学基金资助项目(cstc2012jjA40002); 中央高校基本科研基金资助项目(0216005207016)

Text Classification Model Based on Cooperation of Dual Features of Concept and Root

College of Computer, Chongqing University, Chongqing 400030, China

Received:2015-03-17 Online:2015-08-08 Published:2015-08-19

摘要/Abstract

摘要： 传统半监督文本分类方法，大多数建立在词根特征的基础上，忽略了语义特征的重要性，导致分类精度不高。考虑到语义对分类的影响，本文提出融合概念与词根双特征的文本分类模型。该方法以WordNet为本体库，在Co-training框架下，构造基于概念和词根的双分类器进行协同训练的分类模型。实验分析了新模型分类准确率和召回率，结果显示新模型相对于旧模型在这2方面都有提升，表明基于概念与词根双特征互助的新算法具有更高的有效性。


关键词: 半监督, 语义, 双特征, 协同训练

Abstract: Traditional semi-supervised text classification methods were built based on the features of root, however, the common disadvantage of neglecting the importance of semantic features resulted in low precision of classification. In order to take account of the influence of semantic on classification, a text classification model comprehensively making use of dual features of concept and root was brought forward. Under the framework of cooperative training, this algorithm considered WordNet as ontology library and built double classifiers based on both concept and root for cooperative training. Through experiments, we analyzed the accuracy rate and recall rate of new classification model, and the results showed the promotions of both accuracy rate and recall rate in new model comparing with old model. It indicates that the new algorithm based on cooperation of dual features of concept and root is more effective than the old algorithm.


Key words: , semi-supervised； semantic； dual feature； cooperative training

中图分类号:

TP301.6

古平,吴庭君,文静云. 基于概念与词根双特征互助文本分类模型[J]. 计算机与现代化, 2015, 0(8): 93-97.

GU Ping, WU Ting-jun, WEN Jing-yun. Text Classification Model Based on Cooperation of Dual Features of Concept and Root[J]. Computer and Modernization, 2015, 0(8): 93-97.

参考文献

［1］ Sebastiani F. Machine learning in automated text categorization［J］. ACM Computing Surveys, 2002,34(1):1-47.
［2］ Chen Haibin,Tan Pangning. Semi-supervised learning with data calibration for long-term time series forecasting［C］// Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008:133-141.
［3］ Zhou Xiaojin. Semi-supervised Learning Literature Survey［DB/OL］. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdfb, 2008-07-19.
［4］周志华. 基于分歧的半监督学习［J］. 自动化学报, 2013,39(11):1871-1878.
［5］ Pierce D, Cardie C. Limitations of co-training for natural language learning from large datasets［C］// Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing. 2001:1-9.
［6］ Steedman M, Osborne M, Sarkar A, et al. Bootstrapping statistical parsers from small datasets［C］// Proceedings of the 10th Conference on European Chapter of the Association for Computational. 2003:331-338.
［7］ Li Ming, Li Hang, Zhou Zhihua. Semi-supervised document retrieval［J］. Information Processing & Management, 2008,45(3):341-355.
［8］ Li Ming, Zhou Zhihua. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2007,37(6):1088-1098.
［9］ Mavroeidis D, Chaidos K, Pirillos S, et al. Using tri-training and support vector machines for addressing the ecml-pkdd 2006 discovery challenge［C］// Proceedings of the ECML-PKDD Discovery Challenge Workshop, 2006. 2006:39-47.
［10］徐建良,姜亦宏,张巍,等. 一种基于Co-training的海洋文献分类方法［J］. 中国海洋大学学报(自然科学版), 2010(2):105-110.
［11］刘世岳. 基于Co-training方法的中文组块识别的研究［D］. 沈阳:东北大学, 2004.
［12］Blum A, Mitchell T. Combining labeled and unlabeled data with co-training［C］// Proceedings of the Workshop on Computational Learning Theory. 1998:92-100.
［13］沈新宇. 基于直推式支持向量机的图像分类算法研究与应用［D］. 北京:北京交通大学, 2007.
［14］Hotho A, Staab S, Stumme G. WordNet improves text document clustering［C］// Proceedings of Semantic Web Workshop of the 26th Annual International ACM SIGIR Conference. 2003:541-544.
［15］陈伟萍,王琳,封化民,等. 一种基于语义概念的中文文本分类方法［C］// 第一届建立和谐人机环境联合学术会议（HHME2005）论文集. 2005:401-405.
［16］Li Chenghua, Yang Juncheng, Park S C. Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet［J］.Expert Systems With Applications, 2012,39(1):765-772.
［17］Han Jiawei, Kamber M. 数据挖掘:概念与技术［M］. 范明,孟小峰,译. 2版. 北京:机械工业出版社, 2007:263-266.

[1]	陈宇航1, 杨勇1, 帕力旦·吐尔逊1, 樊小超1, 任鸽1, 刁宇峰2. 融合句法特征与语义特征的作文自动评分方法[J]. 计算机与现代化, 2024, 0(11): 64-69.
[2]	周安达, 唐超颖. 雨天道路场景语义分割算法及其移动端部署[J]. 计算机与现代化, 2024, 0(10): 7-13.
[3]	仁青卓玛1, 2, 3, 拥措1, 2, 3, 唐超超1, 2, 3. 面向藏汉神经机器翻译的数据筛选方法[J]. 计算机与现代化, 2024, 0(06): 19-24.
[4]	乔佳, 徐琨, 胡佩蓉. 多尺度特征融合的版面分析方法[J]. 计算机与现代化, 2024, 0(05): 16-21.
[5]	黄政霖, 董宝良. 基于语义和结构增强的时序知识图谱问答方法[J]. 计算机与现代化, 2024, 0(03): 15-23.
[6]	崔少国, 胡光平. 基于语义分割的嵌套命名实体识别方法[J]. 计算机与现代化, 2024, 0(02): 69-74.
[7]	胡崇佳, 刘金洲, 方立. 基于无监督域适应的室外点云语义分割[J]. 计算机与现代化, 2024, 0(01): 74-79.
[8]	许鸿奎, 李振业, 郭文涛, 赵京政, 郭旭斌. 基于分割的任意形状场景文本实时检测[J]. 计算机与现代化, 2023, 0(11): 95-100.
[9]	叶思佳, 魏延, 杜韩宇, 邓金枝. 结合注意力机制的HRNet图像语义分割算法[J]. 计算机与现代化, 2023, 0(10): 65-69.
[10]	刘续, 查可可. 一种用于机场特种车辆作业的环境目标识别方法[J]. 计算机与现代化, 2023, 0(08): 18-24.
[11]	王鸿, 葛红. 基于注意力机制和语义相似度的跨模态哈希检索[J]. 计算机与现代化, 2023, 0(08): 44-53.
[12]	牛玉珩, 李永可, 陈燕红, 蒋平安. 基于改进SegFormer模型的棉田地表残膜图像分割方法[J]. 计算机与现代化, 2023, 0(07): 93-98.
[13]	叶力鸣, 陈蔚文. 一种结合语义分割和目标检测的级联式绝缘子缺陷检测方法[J]. 计算机与现代化, 2023, 0(06): 82-88.
[14]	李晓峰, 马静, 周琰. 基于增强语义模型的货品名分类算法[J]. 计算机与现代化, 2023, 0(03): 71-78.
[15]	金独亮, 范永胜, 张琪. 文本摘要评测方法的语义损失度[J]. 计算机与现代化, 2023, 0(03): 84-89.

基于概念与词根双特征互助文本分类模型

Text Classification Model Based on Cooperation of Dual Features of Concept and Root

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价