计算机与现代化

• 中文信息处理技术 • 上一篇    下一篇

基于情感词抽取与LDA特征表示的情感分析方法

  

  1. 广西大学计算机与电子信息学院,广西南宁530004
  • 收稿日期:2014-02-17 出版日期:2014-05-28 发布日期:2014-05-30
  • 作者简介:张建华(1986-),女,河南新乡人,广西大学计算机与电子信息学院硕士研究生,研究方向:数据挖掘,文本分类; 梁正友,男,教授,CCF会员,博士,研究方向:信息检索,并行分布式计算。

A Sentiment Analysis Method Based on Sentiment Words Extraction and LDA Feature Representation

  1. College of Computer and Electronics Information, Guangxi University, Nanning 530004, China
  • Received:2014-02-17 Online:2014-05-28 Published:2014-05-30

摘要: 情感分析作为文本挖掘的一个新型领域,可用于分类、归纳用户发布的产品评论,从而有助于商家改善服务,提高产品质量;同时为其他消费者提供购买决策。本文提出一种基于情感词抽取与LDA特征表示的情感分析方法,对产品评论进行褒贬二元分类。在情感词抽取中,采用人工构造的情感词典对预处理之后的文本抽取情感词;用LDA模型建立文档的主题分布,以评论-主题分布作为特征,用SVM分类器进行分类。实验结果表明,本文方法在评论褒贬分类方面有着良好的效果。

关键词: 情感分析, 情感词抽取, LDA, 主题模型, SVM

Abstract: As a new field of sentiment analysis, text mining can be used to classify and summarize online users’ product reviews. The summaries and classification help provider to improve service and product quality, and also provide buyer advicse for other consumers. The paper proposes a sentiment analysis method based on sentiment words extraction and LDA feature representation, for online products’ reviews making binary classification. The processing steps are as follows: extract sentiment words from the preprocessed text using the manually created sentiment dictionary; create the topic subject distribution of documents using the LDA model; take comment-subject distribution as feature; do classification based on the SVM classifier. Experiments show that, the proposed method has excellent effects of review of judgments classification.

Key words: sentiment analysis, sentiment words extraction, latent Dirichlet allocation, topic model, support vector machine

中图分类号: