Computer and Modernization ›› 2019, Vol. 0 ›› Issue (10): 7-.doi: 10.3969/j.issn.1006-2475.2019.10.002

Previous Articles     Next Articles

CNN Text Classification Based on Topic Model Word Vectors

  

  1. (College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China)
  • Received:2019-05-09 Online:2019-10-28 Published:2019-10-29

Abstract: Mining information in Weibo text is of great significance to automatic question and answer, public opinion analysis and other applied research. The text classification study is the basis of text mining. This paper proposes to input simultaneously the text representations of Word2vec and LDA(Latent Dirichlet Allocation) into convolutional neural network model for high-level semantic feature abstraction and classification learning. The input word vectors can represent both the semantic information between the words and the theme of the text. First, We get the word vectors respectively based on the Word2vec model and LDA. Then the word vectors generated by the two models are cascaded to obtain their text matrix representations. Finally, We put the text matrices into the convolutional neural network simultaneously as two channels to classify the texts, and the effectiveness of the method is verified by experiments on Weibo data.

Key words: Word2Vec, LDA, text classification, convolutional neural network

CLC Number: