计算机与现代化

• 应用与开发 • 上一篇    下一篇

基于SVM的酒店客户评论情感分析

  

  1. 1.江西农业大学计算机与信息工程学院,江西南昌330045;
    2.江西农业大学软件学院,江西南昌330045;
    3.江西农业大学江西省高等学校农业信息技术重点实验室,江西南昌330045
  • 收稿日期:2016-08-08 出版日期:2017-03-29 发布日期:2017-03-30
  • 作者简介:石强强(1991-),男,江苏丹阳人,江西农业大学计算机与信息工程学院硕士研究生,研究方向:自然语言处理,数据挖掘,计算机图像处理; 赵应丁(1965-),男,江西余干人,江西农业大学软件学院、江西省高等学校农业信息技术重点实验室教授,博士,研究方向:计算机图像处理,计算机智能接口技术,计算机网络以及网络安全; 杨红云(1975-),男,江西新干人,副教授,硕士,研究方向:农业信息技术。
  • 基金资助:
    国家自然科学基金资助项目(61562039,61363041)

Emotion Analysis of Hotel Customers Reviews Based on SVM

  1. 1. School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China;
    2. School of Software, Jiangxi Agricultural University, Nanchang 330045, China;
    3. Key Laboratory of Agricultural Information Technology, Colleges and Universities of Jiangxi Province,
     Jiangxi Agricultural University, Nanchang 330045, China
  • Received:2016-08-08 Online:2017-03-29 Published:2017-03-30

摘要: 通过增加情感词典种类提高系统对网络词汇、表情符号进行分词和情感分析的准确性;以某酒店的客户评论为原始数据,提取正负向情感词的数量、否定词、程度副词以及特殊符号数量等文本特征后进行不同的特征组合,通过K重交叉验证和网格搜索算法找到SVM(支持向量机)算法的最优参数组合C和g。采用SVM对不同的特征组合进行训练测试并对每个组合的正确率进行分析,然后找出最适合用户评论情感分析的文本特征及特征组合。结果表明:在每个特征组合获取其最优的C和g参数组合的前提下,选用正负向情感词、否定词、情感分值、程度副词的特征组合测试正确率最高,达到93.4%。

关键词: 情感分析, 支持向量机, K重交叉验证, 网格搜索, 特征组合

Abstract: This paper improves the accuracy of word segmentation and emotion analysis of network vocabulary and expressions by increasing the variety of emotion dictionary. On the other hand, customer reviews of a hotel are used as the original data. After extracting the amount of text features, such as positive and negative words, negative words, the degree of adverbs and the amount of special symbols, we make different feature combinations, and hope to find the optimal combination of parameters SVM including C and g through the kfold Cross Validation and grid search algorithm. Training and testing different feature combinations by SVM and analyzing the correct rate of each combination, we find out the most suitable combination of text feature and feature analysis which are used for study of user reviews of emotion. The results show that under the premise of satisfying the optimal combination of parameters C and g, the correct rate of the feature combination using positive and negative emotional words, negative words, emotion score and degree adverbs is the highest and reaches 93.4%.

Key words: emotion analysis, SVM, Kfold cross validation, grid search, feature combination

中图分类号: