Computer and Modernization

Previous Articles     Next Articles

Chinese Short Text Classification Based on Sentence-LDA Topic Model

  

  1. (1. Wuhan Research Institute of Posts and Telecommunications, Wuhan 430000, China; 
    2.Nanjing Fiberhome World Communication Technology Co. Ltd., Nanjing 210000, China)
  • Received:2018-09-05 Online:2019-04-08 Published:2019-04-10

Abstract:  The short text features are sparse and the context is strongly dependent, which leads to the traditional long text classification technology can’t be directly applied. In order to solve the problem of short text feature sparseness, a short text classification method based on Sentence-LDA topic model is proposed. The topic model is an extension of the LDA (Latent Dirichlet Allocation) model, it assumes that a sentence produces only one topic distribution. The trained Sentence-LDA topic model is used to predict the topic distribution of the original short text, thereby extend the obtained topic words into the original short text features, and complete the short text feature expansion. The SVM (Support Vector Machine) is finally used to classify the expanded short text. Experiments show that compared with the traditional method of directly representing short text based on VSM (Vector Space Model), the proposed method can effectively improve the accuracy of short text classification.

Key words: short text classification, Sentence-LDA, topic model, feature extension, SVM

CLC Number: