Computer and Modernization ›› 2022, Vol. 0 ›› Issue (02): 92-96.

Previous Articles     Next Articles

Short Text Classification Method Based on Support Vector Machine

  

  1. (Information Center, Beijing Jiaotong University, Beijing 100044, China)
  • Online:2022-03-31 Published:2022-03-31

Abstract: This paper proposes an effective short text classification method based on support vector machine for short texts with sparse features, non-standard features and unclear topics. Due to the low accuracy and time efficiency of Chinese dependency grammar analysis, in view of the characteristics of client text consultation, this paper did not analyze the dependency grammar of sentences, but mainly uses syntactic features for analysis. Two syntactic features of substrings and subsequences of sentences are find out. Then three feature measure methods such as information gain, mutual information and chi-square statistics are used to realize feature selection effectively. Finally support vector machine method is used to classify text. The model proposed in this paper is applied into a set of real data, and the experimental results show that the average accuracy could reach 84.19%, thus verifying the robustness and effectiveness of the classification method.

Key words: support vector machine, text categorization, semi-supervised, feature selection