Computer and Modernization ›› 2021, Vol. 0 ›› Issue (12): 110-115.

Previous Articles     Next Articles

Microblog Hot Topic Discovery Based on Text Dual Representation Model

  

  1. (Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China)
  • Online:2021-12-24 Published:2021-12-24

Abstract: Microblog is an important platform for information dissemination in contemporary life, mining hot topics on microblog has become one of the important research directions nowadays. In view of the problems of traditional hot topic discovery methods in dealing with microblog text, such as lack of semantic information in text representation, poor effect of mining hot topics and so on, this paper proposes a text dual representation model based on frequent word sets and BERT semantics(FWS-BERT), which calculates the weighted text similarity to perform spectral clustering on microblog text, further, microblog topic mining is carried out based on affinity propagation (AP) clustering algorithm with improved similarity measurement. Finally, a topic heat evaluation method is proposed by introducing the H index in bibliometrics. Experiments show that the proposed method is higher than the single text representation method based on frequent word set and K-means method in contour coefficient and Calinski-Harabasz (CH) index value, and can accurately represent the topic and Evaluate-the popularity of microblog data.

Key words: microblog, frequent word sets, BERT, clustering, hot topics