计算机与现代化

• 人工智能 • 上一篇    下一篇

基于Bi-LSTM和CRF的中文网购评论中商品属性提取

  

  1. (杭州电子科技大学自动化学院,浙江杭州310018)
  • 收稿日期:2018-07-02 出版日期:2019-02-25 发布日期:2019-02-26
  • 作者简介:张诗林(1991-),男,山东临沭人,硕士研究生,研究方向:机器学习,自然语言处理,E-mail: 935832472@qq.com。

Commodity Attributes Extracting in Chinese Shopping Reviews Based on Bi-LSTM and CRF

  1. (School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China)
  • Received:2018-07-02 Online:2019-02-25 Published:2019-02-26

摘要: 随着电子商务系统评价体系的完善,网购评论的内容对消费者的购物起到十分重要的指导作用。但是消费者不能从大量评论中找到自己直接关心的商品属性(如:手机产品的属性“电池”)以及属性相关评价(如:“电池容量很大”)。相对于构建知识库和传统机器学习的方法,需要人工总结复杂的特征和规则来提取商品属性和属性相关评价。本文应用基于词嵌入融合双向长短时记忆网络(Bi-LSTM)和条件随机场(CRF)的方法并根据在评论中属性多为名词、属性评价多为形容词的特点在Bi-LSTM+CRF模型中融入词性特征,实现对评论中的商品属性以及属性评价的自动化提取,在避免总结规则的同时更具领域普适性。通过测试相机、男装、儿童安全座椅3个商品领域,得到了宏精确度为86.74%,宏召回率为85.89%。

关键词: 双向长短时记忆神经网络, 条件随机场, 中文网购评论, 词性特征

Abstract: With the improvement of the evaluation system of e-commerce system, the content of online shopping reviews plays a very important role in guiding consumers shopping. However, consumers cant find attributes and evaluations about attributes directly from a lot of reviews. Compared with constructing knowledge base and traditional machine learning methods, we need to summarize complex features and rules manually to extract attributes and attribute evaluations. This paper applies the method of Bi-directional Long Short-Term Memory (Bi-LSTM), Conditional Random Fields (CRF) and POS features to realize automatic extraction of commodity attributes and attributes evaluations in the reviews. This avoids summarizing the rules and has more domain universality. Through testing camera, menswear and child safety seat, the three commodity areas have obtained the macro precision of 86.74% and the macro recall of 85.89%.

Key words: Bi-LSTM, CRF, Chinese shopping reviews, POS features

中图分类号: