Short Text Classification Method Based on Improved Adversarial Learning and Fusion Features

doi:10.3969/j.issn.1006-2475.2024.04.012

Abstract

Abstract: Abstract: Text classification is one of the most important directions in natural language processing research. Short text has the characteristics of less word count， ambiguity， less key information and not easy to capture， but the algorithm models that classify them are often different in training and reasoning， and mainstream classification models basically model key features and ignore non-key feature information， which increases the challenges in accurate classification. In order to solve the above problems， this paper proposes a short text classification framework combining the fusion of multiple adversarial training strategies and improving the self-attention mechanism. At the beginning， the model adds adversarial perturbation to the text vector representation level to strengthen the text representation ability， and adds an adversarial perturbation to improve the model weights after the F1 score reaches a certain threshold to strengthen the generalization ability of the model during training and inference， thereby assisting in improving the feature learning ability of each classifier of the framework. In terms of feature learning network module， this paper uses the combination of multi-scale convolutional module and bidirectional long short-term memory neural network to learn different granular features， in order to learn nonadjacent feature information， introduces hole convolution， increases the convolution receptive field， and designs a gating mechanism to control the learning speed of this layer information. Finally， by adding a new attention mechanism， the key information is modeled and the non-critical information is modeled， and the loss is added for calculation， which enhances the model’s ability to learn feature information and reduces the risk of overfitting. The tests of THUCNews news headline dataset and Toutiao headline dataset of two large-scale public datasets show that the F1 score of this method is increased by up to 4.93 percentage points and 6.14 percentage points compared with the current mainstream model and classical model， and the effectiveness of adding weight disturbance threshold and different modules is also compared and ablation experiments are explored.

Key words: Key words: adversarial training, strategy fusion, void convolution, non-critical information, attention mechanisms

CLC Number:

TP391

NING Zhaoyang1, 2, SHEN Qing2, 3, HAO Xiulan1, 2, ZHAO Kang1, 2. Short Text Classification Method Based on Improved Adversarial Learning and Fusion Features[J]. Computer and Modernization, 2024, 0(04): 66-76.

References

［1］ MAAS A L， DALY R E， PHAM P T， et al. Learning word vectors for sentiment analysis［C］// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. ACL， 2011:142-150.
［2］ MIKOLOV T， SUTSKEVER I， CHEN K， et al. Distributed representations of words and phrases and their compositionality［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Curran and Associates Inc， 2013:3111-3119.
［3］ MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space［J］. arXiv preprint arXiv:1301.3781， 2013.
［4］ JOULIN A， GRAVE E， BOJANOWSKI P， et al. Bag of tricks for efficient text classification［C］// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. ACL， 2017:427-431.
［5］ PENNINGTON J， SOCHER R， MANNING C D. GloVe: Global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing （EMNLP）. ACL， 2014:1532-1543.
［6］ DEVLIN J， CHANG M W， LEE K T， et al. BERT: Pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL， 2019:4171-4186.
［7］ LAN Z Z， CHEN M D， GOODMAN S， et al. ALBERT: A lite BERT for self-supervised learning of language representations［J］. arXiv preprint arXiv:1909.11942， 2019.
［8］ HE P C， LIU X D， GAO J F， et al. DeBERTa: Decoding-enhanced BERT with disentangled attention［J］. arXiv preprint arXiv:2006.03654， 2020.
［9］ KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing （EMNLP）. ACL， 2014:1746-1751.
［10］ MIKOLOV T， KOMBRINK S， BURGET L， et al. Extensions of recurrent neural network language model［C］// Proceedings of the 2011 IEEE International Conference on Acoustics， Speech and Signal Processing. IEEE， 2011:5528-5531.
［11］ SOCHER R， LIN C C Y， NG A Y， et al. Parsing natural scenes and natural language with recursive neural networks［C］// Proceedings of the 28th International Conference on Machine Learning. Omnipress， 2011:129-136.
［12］何力，郑灶贤，项凤涛，等. 基于深度学习的文本分类技术研究进展［J］. 计算机工程， 2021，47（2）:1-11.
［13］ CHO K， VAN MERRIENBOER B， GULCEHRE C， et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing （EMNLP）. ACL， 2014:1724-1734.
［14］ LAI S W， XU L H， LIU K， et al. Recurrent convolutional neural networks for text classification［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. AAAI Press， 2015，29（1）:2267-2273.
［15］ DEY R， SALEM F M. Gate-variants of gated recurrent unit （GRU） neural networks［C］// Proceedings of the 60th IEEE International Midwest Symposium on Circuits and Systems. IEEE， 2017:1597-1600.
［16］ YAO L， MAO C S， LUO Y. Graph convolutional networks for text classification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI Press， 2019，33（1）:7370-7377.
［17］ KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［J］. arXiv preprint arXiv:1609.02907， 2016.
［18］ QIN Q， HU W P， LIU B. Feature projection for improved text classification［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL， 2020:8161-8171.
［19］程艳，尧磊波，张光河，等. 基于注意力机制的多通道CNN和BiGRU的文本情感倾向性分析［J］. 计算机研究与发展， 2020，57（12）:2583-2595.
［20］ ZHOU P， SHI W， TIAN J， et al. Attention-based bidirectional long short-term memory networks for relation classification［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. ACL， 2016:207-212.
［21］李博涵，向宇轩，封顶，等. 融合知识感知与双重注意力的短文本分类模型［J］. 软件学报， 2022，33（10）:3565-3581.
［22］ GOODFELLOW I J， SHLENS J， SZEGEDY C. Explaining and harnessing adversarial examples［J］. arXiv preprint arXiv:1412.6572， 2014.
［23］ MIYATO T， DAI A M， GOODFELLOW I. Adversarial training methods for semi-supervised text classification［J］. arXiv preprint arXiv:1605.07725， 2016.
［24］ HE K M， FAN H Q， WU Y X， et al. Momentum contrast for unsupervised visual representation learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE， 2020:9726-9735.
［25］ YE M， ZHANG X， YUEN P C， et al. Unsupervised embedding learning via invariant and spreading instance feature［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE， 2019:6203-6213.
［26］ GAO T Y， YAO X C， CHEN D Q. SimCSE: Simple contrastive learning of sentence embeddings［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing （EMNLP）. ACL， 2021:6894-6910.
［27］ LIANG X B， WU L J， LI J T， et al. R-Drop: Regularized dropout for neural networks［J］. arXiv preprint arXiv:2106.14448， 2021.
［28］ WU D X， XIA S T， WANG Y S. Adversarial weight perturbation helps robust generalization［J］. arXiv preprint arXiv:2004.05884， 2020.
［29］ YU F， KOLTUN V. Multi-Scale context aggregation by dilated convolutions［J］. arXiv preprint arXiv:1511.07122， 2015.
［30］ VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Advances in Neural Information Processing Systems 30 （NIPS 2017）. Curran Associates Inc， 2017:6000-6010.
［31］ KINGMA D P， BA J. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980， 2014.
［32］ JOHNSON R， ZHANG T. Deep pyramid convolutional neural networks for text categorization［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. ACL， 2017:562-570.
［33］张海丰，曾诚，潘列，等. 结合BERT和特征投影网络的新闻主题文本分类方法［J］. 计算机应用， 2022，42（4）:1116-1124.
［34］刘硕，王庚润，彭建华，等. 基于混合字词特征的中文短文本分类算法［J］. 计算机科学， 2022，49（4）:282-287.
［35］李启行，廖薇，孟静雯. 基于注意力机制的双通道DAC-RNN文本分类模型［J］. 计算机工程与应用， 2021，58（16）:157-163.
［36］范昊，何灏. 融合上下文特征和BERT词嵌入的新闻标题分类研究［J］. 情报科学， 2022，40（6）:90-97.