计算机与现代化 ›› 2012, Vol. 1 ›› Issue (1): 6-9,13.doi: 10.3969/j.issn.1006-2475.2012.01.002

• 人工智能 • 上一篇    下一篇

LDA及主题词相关性的新事件检测

黄 颖   

  1. 赣南师范学院数学与计算机科学学院,江西 赣州 341000
  • 收稿日期:2011-09-08 修回日期:1900-01-01 出版日期:2012-01-10 发布日期:2012-01-10

New Event Detection Based on LDA and Correlation of Topic Terms

HUANG Ying   

  1. School of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, China
  • Received:2011-09-08 Revised:1900-01-01 Online:2012-01-10 Published:2012-01-10

摘要: 目前,话题检测与跟踪已被广泛应用,新事件检测作为话题检测与跟踪领域中的研究任务之一,为跟踪后续话题发展的先验知识,在话题检测与跟踪领域具有重要的理论研究意义。LDA主题模型不能自动识别新事件,其主题数需通过人工或反复实验来确定,识别效率低。本文提出基于LDA及主题词间的相关性新事件检测算法,同时结合报道发生的时间,确定合理的主题数目,从而探知新事件。实验证明,与传统LDA算法及Gibbs LDA算法相比,该方法具有一定优势,提高了对新事件检测的敏感度。

关键词: LDA, 话题检测, 新事件检测, 主题词相关性

Abstract: Topic detection and tracking(TDT) is widely used. As one of research tasks for TDT, new event detection can provide prior knowledge to TDT, so it is of great theoretical research significance in the field of TDT. Because LDA model can not automatically identify new events, and the number of LDA topic is determined by the artificial, or by repeated experiments, it is of low efficiency. This paper presents a new method based on LDA and correlation of topic terms, which considers the correlation of subject terms and report time, it can dynamically adapt updated topics and then detect the new event. Experiment results demonstrate that this method is of some advantages and the sensitivity of new events detection is increased.

Key words: latent Dirichlet allocation(LDA), topic detection, new event detection, correlation of the topic terms

中图分类号: