计算机与现代化

• 人工智能 • 上一篇    下一篇

基于扩展词典与语义规则的中文微博情感分析

  

  1. (北京交通大学计算机与信息技术学院,北京100044)
  • 收稿日期:2017-06-02 出版日期:2018-03-08 发布日期:2018-03-09
  • 作者简介:李继东(1992-),男,河南信阳人,北京交通大学计算机与信息技术学院硕士研究生,研究方向:移动与互联网; 王移芝(1953-),女,教授,研究方向:移动与互联网,计算机网络与数据库。
  • 基金资助:
    国家自然科学基金“面上”项目(K13A300050)

Sentiment Analysis of Chinese Microblog Based on Expand-dictionary and Semantic Rule

  1. (School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
  • Received:2017-06-02 Online:2018-03-08 Published:2018-03-09

摘要: 首先分析微博文本新词出现规律,通过程度词发现微博新词,然后通过拓展的PMI算法,计算新词与情感基准词之间的点互信息值,根据点互信息值将新词分为褒贬2类后加入微博领域词典。接着构建基础情感词典,考虑到微博文本的独特性和汉语言特点,构建微博表情词典、否定词典、程度词词典、连词词典。最后结合情感词典与语义规则,通过与微博表情进行情感值加权的方式来对中文微博进行情感分析。通过对抓取的微博数据集进行测试,验证了本文提出的分析策略的有效性。

关键词: 微博, 情感分析, 情感词典, 语义规则

Abstract: Firstly this paper focuses on the occurrence rule of new words in microblog texts, finds microblogging new words through adverbs, then calculates the SO-PMI between the new words and the emotional benchmark words by optimized PMI algorithm, based on which the new words are divided into praiseful and derogatory categories and then been added to microblog domain dictionary. Secondly, basic emotional dictionary is constructed, considering the uniqueness of microblogging text and the characteristics of Chinese language, we construct microblogging expression dictionary, negative word dictionary, adverbs dictionary, conjunctions dictionary. Finally, combined with the emotional dictionary and semantic rules, we carry on an emotional analysis on Chinese microblogging by the means of emotional weighting with microblogging expressions. The validity of the proposed analysis strategy is verified by testing the microblogging data set.

Key words: microblog, sentiment analysis, emotion dictionary, semantic rule

中图分类号: