计算机与现代化

• 人工智能 • 上一篇    下一篇

一种基于语义分析的热点新闻发现方法

  

  1. (南京理工大学计算机科学与工程学院,江苏 南京 210094)
  • 收稿日期:2016-11-16 出版日期:2017-06-23 发布日期:2017-06-23
  • 作者简介:曹通(1990-),男,江苏宿迁人,南京理工大学计算机科学与工程学院硕士研究生,研究方向:自然语言处理,数据挖掘。

A News Hot Spot Detection Method Based on Semantic Analysis

  1. (School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China)
  • Received:2016-11-16 Online:2017-06-23 Published:2017-06-23

摘要: 随着互联网的发展和普及,互联网新闻报道已是人们获取社会信息的主要手段,如何快速准确地获取互联网新闻热点话题是一个急需解决的问题。本文使用LDA(Latent Dirichlet Allocation)和BTM(Biterm Topic Model)主题模型,充分考虑新闻标题和新闻正文对新闻热点检测影响的不同,分别对新闻的正文和标题进行语义分析,新闻标题使用BTM模型,新闻正文使用LDA模型,提取主题特征向量,并将2种语义特征进行融合,形成全文的语义特征,然后通过改进的聚类算法,进行聚类,在此基础上引入新闻热度的定义,通过热度公式计算新闻的热度,利用计算出的热度值排序得到最近一段时间的热点新闻。通过在爬取的新闻数据上的实验,验证了本文方法的有效性和实用性。

关键词: 隐含语义分析, 新闻热度, 话题检测, LDA与BTM模型

Abstract: With the development and popularization of the Internet, Internet news reports are the main means for people to get social information. How to get the hot topic of Internet news quickly and accurately is an urgent problem to be solved. This paper uses the theme model of LDA (Latent Dirichlet Allocation) and BTM (Biterm Topic Model), fully considering the different impacts of news headlines and news content on news hot spot detection, to make the semantic analysis of news content and title respectively. By using the BTM model for news headlines and the LDA model for news content, we extract the feature vectors of the topic and combine the two semantic features to form the semantic feature of the whole text. Then, through improved clustering algorithm, the number of documents belonging to each topic is calculated. On this basis, by defining the news heat and using the news heat formula, the news heat is calculated to get the most recent hot news through ordering the news heat values. Through the experiments on the crawling news data, the validity and practicability of the method are verified.

Key words: latent semantic analysis, news heat, topic detection, LDA and BTM model

中图分类号: