计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于多层类别主题图模型的教育文本分类方法

  

  1. 湖北师范学院教育信息与技术学院,湖北黄石435002
  • 收稿日期:2016-01-18 出版日期:2016-07-21 发布日期:2016-07-22
  • 作者简介: 李全(1982-),男,湖北黄陂人,湖北师范学院教育信息与技术学院讲师,硕士,研究方向:信息检索,数据挖掘。
  • 基金资助:
     湖北省教育科学“十二五”规划项目(2011B130); 湖北省高等学校优秀中青年科技创新团队计划项目(T201515)

 Classification Method of Education Text Based on #br#   Hierachical Class Topic Graph Model

  1. Department of Educational Information and Technology, Hubei Normal University, Huangshi 435002, China
  • Received:2016-01-18 Online:2016-07-21 Published:2016-07-22

摘要:   在互联网大数据时代,网络教育资源信息以爆炸式增长。层次分类能满足大规模教育文本多类别、多层次的分类要求,但传统层次分类的类别表示模型存在向量高维稀疏、缺乏语义理解等问题。针对以上问题,提出一种基于多层类别主题图模型的教育文本分类方法。该方法通过多层类别主题图模型对文本集进行建模,得到文本的多层类别-词项概率矩阵;利用3种特征提取方法的互补性进行组合特征提取,进一步提高特征词和主题类别关联度;利用多层SVM分类器进行分类。实验结果表明,该方法在性能上与传统的多层文本分类方法相比,宏平均MacroP、MacroR和MacroF1等评估值都有一定的提高,具有较好的网络教育文本分类效果和应用前景。

关键词:  , 教育资源, 层次分类, 文本分类, 主题图模型, 概率矩阵, 支持向量机

Abstract:  There are more and more education resource of information in the period of big data on the Web. The classification requirement of a great number of education texts of being multi-class, multi-level can be satisfied by hierachical classification. Therefore, the class representation model of traditional hierachical classification has high-dimension and sparse problem, and it’s lack of semantic understanding. To solve the above problems, the classification method of education text based on hierachical class topic graph model was proposed. The text set was modelled by the hierachical class topic graph model. Probability matrices of hierachical class-word of the texts were obtained. In order to further improve correlation between feature words and classes, the combined feature was extracted by the complementarity of the three kinds of way extracting feature. Finally, the texts were classified by the hierachical SVM classifier. The analysis on simulation result indicates that the evaluation values of MacroP, MacroR and MacroF1 etc increase to some extend, comparing to traditional hierachical classification method. Therefore the method has good classification effect of Internet education text, and application prospect.

Key words: education resource, hierachical classification, text classification, topic graph model, probability matrix, support vector machine(SVM)