计算机与现代化 ›› 2012, Vol. 208 ›› Issue (12): 127-130,.doi: 10.3969/j.issn.1006-2475.2012.12.033

• 应用与开发 • 上一篇    下一篇

基于主题模型的主观性句子识别

吴超荣,廖祥文   

  1. 福州大学数学与计算机科学学院,福建福州350108
  • 收稿日期:2012-08-06 修回日期:1900-01-01 出版日期:2012-12-22 发布日期:2012-12-22

Subjectivity Sentence Identification Based on Topic Model

WU Chao-rong, LIAO Xiang-wen   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
  • Received:2012-08-06 Revised:1900-01-01 Online:2012-12-22 Published:2012-12-22

摘要: 主观性句子识别旨在发现文本集合中具有观点的句子。本文基于概率主题模型,提出融合主题的主观性句子识别模型。该模型通过考虑主题因素识别句子主观性,同时挖掘文本集合中的潜在主观性主题。提出的模型是一个弱监督生成模型,不需要大量的标记语料进行训练,仅需要一小部分领域独立的主观性词典修改模型的先验。实验证明,提出的模型能有效地提高句子识别召回率和F值,同时抽取的主观性主题具有较强的语义信息。

关键词: 主观性句子识别, 观点挖掘, 概率主题模型, 弱监督

Abstract: Subjectivity sentence identification aims to detect the opinionated sentences in text. This paper proposes mixing topics and subjectivity sentence identification model based on probabilistic topic model. Through considering the topics, the model can detect the subjective sentences, and can also extract the subjective topics from texts simultaneously. The proposed model is a weaklysupervised generative model, which only needs a small set of domain independent subjectivity lexicon to modify prior of model. The experiment results demonstrate that the model can highly improve the sentence subjectivity identification recall and the Fvalue, and the extracted subjectivity topics are semantically informative.

Key words: subjectivity sentence identification, opinion mining, probabilistic topic model, weakly-supervised

中图分类号: