计算机与现代化

• 人工智能 • 上一篇    下一篇

基于GRU+LDA的群聊主题挖掘

  


  1. (1.武汉邮电科学研究院,湖北武汉430074;2. 南京烽火星空通信发展有限公司,江苏南京210000;
    3.南京烽火天地通信科技有限公司,江苏南京210019)
  • 收稿日期:2018-07-11 出版日期:2019-01-03 发布日期:2019-01-04
  • 作者简介:汤鲲(1979-),男, 武汉邮电科学研究院、南京烽火星空通信发展有限公司高级工程师,硕士,研究方向:大数据分析; 陈思思(1993-),女,湖北孝感人,武汉邮电科学研究院、南京烽火天地通信科技有限公司硕士研究生,研究方向:机器学习,数据挖掘。

GRU and LDA Based Group Chat Topic Mining

  1. (1. Wuhan Research Institute of Posts and Telecommunications, Wuhan430074, China;
    2. Fiber Home Starry Sky Co. Ltd., Nanjing210000, China;
    3. Fiber Home World Communication Technology Co. Ltd., Nanjing210019, China)
  • Received:2018-07-11 Online:2019-01-03 Published:2019-01-04

摘要: 社交网络发展迅速,即时消息系统已成为人们日常生活中必不可少的沟通交流工具。在线群聊能使人们迅速交流生活、技术及工作等信息,但是由于群聊信息更新较快,大量的信息导致跟进群聊话题是困难的。传统的主题挖掘模型不能很好地适用于群聊文本的挖掘。通过对群聊文本的特征进行分析,提出一种基于GRU和LDA的群聊会话主题挖掘(GLB-GCTM, GRU and LDA Based Group Chat Topic Mining)模型,解决了传统主题模型不能解决的词语顺序问题。首先,假定每个文档有一个基于高斯分布的主题向量,然后根据GRU原理产生每个词的隐含状态,根据当前词的隐含状态的伯努利分布确定当前词是否为停用词,以决定所使用的语言模型。该方法使用笔者加入的10个QQ群最近3个月的群聊数据集进行试验验证,结合对比实验评估标准,该模型能够有效识别出群聊文本中的主题。

关键词: 主题挖掘, 群聊文本, 深度学习, GRU, LDA

Abstract: As the fast development of social network, instant messaging system has become an essential communication tool in our daily lives. We can quickly exchange information about life, technology and work through online group chat. However, due to the faster update of group chat messages, it is difficult for us to obtain group chat topics. And traditional topic mining models are not well suited to the topic mining of group chat texts. By analyzing the characteristics of group chat messages, GRU and LDA Based Group Chat Topic Mining(GLB-GCTM) model is proposed, which solves the problem of word order that cannot be solved by traditional theme models. First, assuming that each document has a Gaussian-distribution topic vector, then the latent state of each word is generated according to the GRU, and the current word is determined as a stop word based on the Bernoulli distribution of the latent state of the current word to determine which language model to use. This method uses ten QQ groups that authors join in and collect the last three-months group chat messages for test. The model can effectively identify the topics in the group chat text combined with the comparative experiment evaluation criteria.

Key words: topic mining, group chat, deep learning, GRU; LDA

中图分类号: