基于改进LDA主题模型的产品特征抽取

doi:10.3969/j.issn.1006-2475.2016.11.001

计算机与现代化 ›› 2016, Vol. 0 ›› Issue (11): 1-6,57.doi: 10.3969/j.issn.1006-2475.2016.11.001

• 人工智能 • 下一篇

基于改进LDA主题模型的产品特征抽取

重庆大学计算机学院,重庆400030

收稿日期:2016-05-04 出版日期:2016-11-15 发布日期:2016-11-23
作者简介:佘维军(1991-),男,四川南充人,重庆大学计算机学院硕士研究生,研究方向：数据挖掘,自然语言处理; 刘子平(1990-),男,湖北潜江人,硕士研究生,研究方向:数据挖掘,自然语言处理; 杨卫芳(1989-),女,河南商丘人,硕士研究生,研究方向:数据挖掘,推荐系统。
基金资助:
国家自然科学基金资助项目（90818028）

Product Feature Extraction Based on Improved LDA Topic Model

School of Computer, Chongqing University, Chongqing 400030, China

Received:2016-05-04 Online:2016-11-15 Published:2016-11-23

摘要/Abstract

摘要： 针对LDA主题模型用于产品特征抽取中存在的问题，提出将句法分析和主题模型相结合的SA-LDA方法。首先基于句法分析对产品所在类别下的所有产品评论进行分析抽取显式特征，并聚类产生特征集和观点集，据此构建语料库。接着对待分析产品的每条评论，提取主观句并利用改进LDA模型对其主题进行学习，根据语料库构建must-link和cannot-link约束条件，在主题更新时对其进行约束和引导，每个主题对应一个特征类。实验表明，本文方法对显式特征和隐式特征都具有很好的实验效果，且相比传统的方法和其他改进方法在保证召回率的同时对准确率也有一定程度的提高。


关键词: 潜在狄利克雷分布, 主题模型, 句法分析, 特征抽取, 约束条件

Abstract: Aiming at the problems existing in LDA model used to extract product features, a method combined syntactic analysis and topic model, named SA-LDA, is proposed. Firstly, we analyze reviews under products which belong to a category based on syntactic analysis, extract explicit features and cluster them to get feature set and opinion set, and then construct corpus. After that, opinion sentences are extracted to be used for topic clustering, must-link and cannot-link are constructed for guiding the topic learning and each topic corresponds to a specific feature cluster. Experiments show that the performance of the method proposed in this paper is good in explicit features and implicit features, and it not only ensures recall rate, but also improves precision score compared to other methods.


Key words: latent Dirichlet allocation, topic model, syntactic analysis, feature extraction, constraint condition

中图分类号:

TP391.1

佘维军，刘子平，杨卫芳. 基于改进LDA主题模型的产品特征抽取[J]. 计算机与现代化, 2016, 0(11): 1-6,57.

SHE Wei-jun, LIU Zi-ping, YANG Wei-fang. Product Feature Extraction Based on Improved LDA Topic Model[J]. Computer and Modernization, 2016, 0(11): 1-6,57.

参考文献

［1］ Hu Minqing, Liu Bing. Mining and summarizing customer reviews［C］ // Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004:168-177.
［2］周红庆,吴扬扬. 中文客户评论对象特征的抽取与聚类方法［J］. 微型机与应用, 2014,33(15):69-71.
［3］ Hai Zhen, Chang Kuiyu, Kim J. Implicit feature identification via co-occurrence association rule mining［C］// Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing. 2011:393-404.
［4］ Wang Wei, Xu Hua, Wan Wei. Implicit feature identification via hybrid association rule mining［J］. Expert Systems with Applications, 2013,40(9):3518-3531.
［5］ Chinsha T C, Joseph S. A syntactic approach for aspect based opinion mining［C］// 2015 IEEE International Conference on Semantic Computing. 2015:24-31.
［6］扈中凯,郑小林,吴亚峰,等. 基于用户评论挖掘的产品推荐算法［J］. 浙江大学学报(工学版), 2013,47(8):1475-1485.
［7］ Poria S, Cambria E, Ku L W, et al. A rule-based approach to aspect extraction from product reviews［C］// Proceedings of the 2nd Workshop on Natural Language Processing for Social Media (SocialNLP). 2014:28-37.
［8］ Liu Qian, Gao Zhiqiang, Liu Bing, et al. Automated rule selection for aspect extraction in opinion mining［C］// Proceedings of the 24th International Conference on Artificial Intelligence. 2015:1291-1297.
［9］ Zeng Lingwei, Li Fang. A classification-based approach for implicit feature identification［M］// Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer Berlin Heidelberg, 2013:190-202.
［10］Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation［J］. the Journal of Machine Learning Research, 2003,3:993-1022.
［11］Andrzejewski D, Zhu Xiaojin, Craven M. Incorporating domain knowledge into topic modeling via Dirichlet forest priors［C］// Proceedings of the 26th Annual International Conference on Machine Learning. 2009:25-32.
［12］Zhai Zhongwu, Liu Bing, Xu Hua, et al. Constrained LDA for grouping product features in opinion mining［C］// Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2011:448-459.
［13］Ma Baizhang, Zhang Dongsong, Yan Zhijun, et al. An LDA and synonym Lexicon based approach to product feature extraction from online consumer product reviews［J］. Journal of Electronic Commerce Research, 2013,14(4):304-314.
［14］Chen Zhiyuan, Mukherjee A, Liu Bing. Aspect extraction with automated prior knowledge learning［C］// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014:347-358.
［15］Xu Hua, Zhang Fan, Wang Wei. Implicit feature identification in Chinese reviews using explicit topic mining model［J］. Knowledge-Based Systems, 2015,76:166-175.
［16］彭云,万常选,江腾蛟,等. 一种词聚类LDA的商品特征提取算法［J］. 小型微型计算机系统, 2015,36(7):1458-1463.
［17］Xie Pengtao, Yang Diyi, Xing E P. Incorporating word correlation knowledge into topic modeling［C］// Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics. 2015.

[1]	王浩畅, 刘如意. 基于预训练模型的关系抽取研究综述[J]. 计算机与现代化, 2023, 0(01): 49-57.
[2]	徐胜超, 叶力洪. 基于长短期记忆神经网络的容器云队列在线任务动态分配[J]. 计算机与现代化, 2022, 0(07): 79-84.
[3]	李珊, 陈妙苗, 郑晨. 一种基于图挖掘的LDA改进算法[J]. 计算机与现代化, 2022, 0(07): 61-66.
[4]	文勇军, 何环晶, 唐立军, . 基于LDA的隐式标签协同过滤推荐算法[J]. 计算机与现代化, 2022, 0(03): 53-58.
[5]	张浩1,2，钟敏1，2. 基于Sentence-LDA主题模型的短文本分类[J]. 计算机与现代化, 2019, 0(03): 102-.
[6]	李霄野，李春生，李龙,张可佳. 基于LDA模型的文本聚类检索[J]. 计算机与现代化, 2018, 0(06): 7-.
[7]	张建恒，黄蔚，胡国超. 基于LDA模型和AP聚类的主题事件抽取技术[J]. 计算机与现代化, 2017, 0(12): 77-81+87.
[8]	张建华1,肖中正2. 结合词性规则和依存句法分析的评价对象抽取方法[J]. 计算机与现代化, 2016, 0(4): 16-20.
[9]	万红新1，彭云2，郑睿颖1. 时序化LDA的舆情文本动态主题提取[J]. 计算机与现代化, 2016, 251(07): 91-94.
[10]	黄可望1，冯宗越2,3，朱嘉钢2,3. 基于NIB2DPCA的彩色图像过完整分块特征抽取方法[J]. 计算机与现代化, 2015, 0(12): 78-.
[11]	杨慧1，刘红岩2，何军3. 中文产品评论结构化引擎[J]. 计算机与现代化, 2014, 0(7): 1-7+15.
[12]	张建华,梁正友. 基于情感词抽取与LDA特征表示的情感分析方法[J]. 计算机与现代化, 2014, 0(5): 79-83.
[13]	郑诚;李鸿. 基于主题模型的K-均值文本聚类[J]. 计算机与现代化, 2013, 1(8): 78-80,8.
[14]	施惟;. 基于话题模型的视频动作识别系统研究[J]. 计算机与现代化, 2013, 1(4): 1-4.
[15]	史倩. 基于LSA-HMM的新闻主题分割[J]. 计算机与现代化, 2012, 1(201): 27-30,3.

基于改进LDA主题模型的产品特征抽取

Product Feature Extraction Based on Improved LDA Topic Model

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价