计算机与现代化

• 网络与通信 • 上一篇    下一篇

基于句法决策树和SVM的短文本语境识别模型

  

  1. 1.烽火通信科技股份有限公司南京研发部,江苏南京210019; 2.武汉邮电科学研究院,湖北武汉430074
  • 收稿日期:2016-08-30 出版日期:2017-03-29 发布日期:2017-03-30
  • 作者简介:王峥(1977-),男,江苏徐州人,烽火通信科技股份有限公司南京研发部高级工程师,研究方向:海量数据分析,网络行为分析; 刘师培(1992-),男,湖北潜江人,武汉邮电科学研究院 硕士研究生,研究方向:数据分析,自然语言处理; 彭艳兵(1975-),男,博士,研究方向:网络安全。

An Essay Context Recognition Model Based on Syntax Decision Tree and SVM Algorithm

  1. 1. Nanjing R&D, FiberHome Telecommunication Technologies Co., Ltd., Nanjing 210019, China;
    2.Wuhan Research Institute of Posts and Telecommunications, Wuhan 430074, China
  • Received:2016-08-30 Online:2017-03-29 Published:2017-03-30

摘要:

随着社会生活网络化的日趋成熟,在很多研究和商业领域里都遇到了中文文本处理问题。不断深化的文本分类研究需要从文本的各个方面来解析文本信息,语义解析是文本挖掘的关键技术,语境
识别可以应用在许多文本挖掘技术中,比如情感分析、舆情分析等。基于句法决策树、Ngram模型的特征要素提取方法和SVM分类器,提出一种语境分类模型,解决字词在不同语境下的多义性问题。该
模型具有良好的泛化能力,在批量处理时具有很好的通用效果,能比较有效地解决文本挖掘中语境识别难题。

关键词: 中文文本处理, 语境识别, 决策树, Ngram模型, SVM分类器

Abstract:

With the increasing maturity of the networked social life, many fields such as research and commerce have encountered the problems of processing Chinese texts. Parsing
the texts from all aspects is necessary for the increasingly deepening research of text classification. On the one hand, semantic parsing is essential to text mining. On the
other hand, context recognition can be widely applied in numerous text mining problems, such as sentiment analysis, public feeling analysis and so on. In this paper, a context
classification model is proposed based on syntactic decision trees, Ngram feature extraction and SVM classifiers to recognize the different meanings of the same words under
different short contexts. The results show that the proposed model, which can batch process data, has favorable generalization ability, indicating that the model can solve the
problems of context recognition efficiently.

Key words:  , Chinese text processing; context identification; decision tree; Ngram model; SVM classifier

中图分类号: