Computer and Modernization ›› 2024, Vol. 0 ›› Issue (08): 120-126.doi: 10.3969/j.issn.1006-2475.2024.08.019

Previous Articles    

News Long Text Classification Model Based on Improved TF-IDF and AGLCNN

  

  1. (School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China)
  • Online:2024-08-28 Published:2024-08-29

Abstract:  News long text classification is an important task in natural language processing, but traditional text representation methods have problems such as sparse features and insufficient semantics. In addition, long news texts contain a large amount of redundant information and may involve other topics, all of which can lead to incomplete text feature extraction. Therefore, this article proposes a news long text classification model based on improved TF-IDF algorithm and AGLCNN. This model firstly improves the TF-IDF algorithm by utilizing the distribution and position information of feature items between and within classes, and combines Word2Vec word vectors for text representation. Using attention mechanism to highlight keyword information, we input it into Bi-LSTM to capture text contextual features. Then we use CNN to highlight the prominent features of news topics. Considering that there may be sentences involving other topic information in long news texts, a gating mechanism is introduced to fuse the output features of Bi-LSTM and CNN to obtain the final text feature representation. Finally, we input the feature vectors into the Softmax layer for news classification. Comparative experiments are conducted on the THUCNews dataset and the Sohu News dataset, and the results show that the proposed model has recall rates of 0.985 and 0.976 on both datasets, respectively, which are superior to other classification models.

Key words:  , text classification, TF-IDF, attention mechanism, convolutional neural network, characteristic item

CLC Number: