计算机与现代化

• 人工智能 •    下一篇

基于改进LDA主题模型的产品特征抽取

  

  1. 重庆大学计算机学院,重庆400030
  • 收稿日期:2016-05-04 出版日期:2016-11-15 发布日期:2016-11-23
  • 作者简介:佘维军(1991-),男,四川南充人,重庆大学计算机学院硕士研究生,研究方向:数据挖掘,自然语言处理; 刘子平(1990-),男,湖北潜江人,硕士研究生,研究方向:数据挖掘,自然语言处理; 杨卫芳(1989-),女,河南商丘人,硕士研究生,研究方向:数据挖掘,推荐系统。
  • 基金资助:
    国家自然科学基金资助项目(90818028)

Product Feature Extraction Based on Improved LDA Topic Model

  1. School of Computer, Chongqing University, Chongqing 400030, China
  • Received:2016-05-04 Online:2016-11-15 Published:2016-11-23

摘要: 针对LDA主题模型用于产品特征抽取中存在的问题,提出将句法分析和主题模型相结合的SA-LDA方法。首先基于句法分析对产品所在类别下的所有产品评论进行分析抽取显式特征,并聚类产生特征集和观点集,据此构建语料库。接着对待分析产品的每条评论,提取主观句并利用改进LDA模型对其主题进行学习,根据语料库构建must-link和cannot-link约束条件,在主题更新时对其进行约束和引导,每个主题对应一个特征类。实验表明,本文方法对显式特征和隐式特征都具有很好的实验效果,且相比传统的方法和其他改进方法在保证召回率的同时对准确率也有一定程度的提高。

关键词: 潜在狄利克雷分布, 主题模型, 句法分析, 特征抽取, 约束条件

Abstract: Aiming at the problems existing in LDA model used to extract product features, a method combined syntactic analysis and topic model, named SA-LDA, is proposed. Firstly, we analyze reviews under products which belong to a category based on syntactic analysis, extract explicit features and cluster them to get feature set and opinion set, and then construct corpus. After that, opinion sentences are extracted to be used for topic clustering, must-link and cannot-link are constructed for guiding the topic learning and each topic corresponds to a specific feature cluster. Experiments show that the performance of the method proposed in this paper is good in explicit features and implicit features, and it not only ensures recall rate, but also improves precision score compared to other methods.

Key words: latent Dirichlet allocation, topic model, syntactic analysis, feature extraction, constraint condition

中图分类号: