A Text Classification Model Based on BERT and Pooling Operation

Abstract

Abstract: The fine-tuning method using the pre-trained language model has achieved good results in many natural language processing tasks represented by text classification, BERT model based on the Transformer framework as a typical representative especially. However, BERT uses the vector corresponding to ［CLS］ as the text representation directly, and does not consider the local features and global features of texts, which limits the classification performance of the model. Therefore, this paper proposes a text classification model that introduces a pooling operation, and uses pooling methods such as average pooling, maximum pooling, and K-MaxPooling to extract the representation vector of texts from the output matrix of BERT. The experimental results show that compared with the original BERT model, the text classification model with pooling operation proposed in this paper has better performance. In all text classification tasks in the experiment, its accuracy and F1-Score value are better than BERT model.

Key words: text classification, classification model, BERT, mean-pooling, max-pooling, K-MaxPooling

ZHNAG Jun, QIU Long-long. A Text Classification Model Based on BERT and Pooling Operation[J]. Computer and Modernization, 2022, 0(06): 1-7.

References ［27］

［1］	KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences［C］// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014:655-665.
［2］	SHEN D H, ZHANG Y Z, HENAO R, et al. Deconvolutional latent-variable model for text sequence matching［C］// The 32nd AAAI Conference on Artificial Intelligence. 2018,32(1).
［3］	WU H Y, LIU Y, SHI S Y. Modularized syntactic neural networks for sentence classification［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. DOI: 10.18653/v1/2020.emnlp-main.222.
［4］	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition［J］. Proceedings of the IEEE, 1998,86(11):2278-2324.
［5］	朱雪晨,陈三林,蔡刚,等. 降低参数规模的卷积神经网络模型压缩方法［J］. 计算机与现代化, 2021(9):83-89.
［6］	刘奇旭,刘心宇,罗成,等. 基于双向循环神经网络的安卓浏览器指纹识别方法［J］. 计算机研究与发展, 2020,57(11):2294-2311.
［7］	夏瑜潞. 循环神经网络的发展综述［J］. 电脑知识与技术, 2019,15(21):182-184.
［8］	石磊,王明宇,宋哲理,等.自注意力机制和BiGRU相结合的文本分类研究［J/OL］. 小型微型计算机系统:1-10［2021-11-18］. https://kns-cnki-net.webvpn.ecut.edu.cn/kcms/detail/21.1106.TP.20211102.1155.010.html.
［9］	罗嘉,王乐豪,涂姗姗,等. 基于LSTM-BLS的突发气象灾害事件中公众情感倾向分析［J/OL］. 南京信息工程大学学报(自然科学版):1-13［2021-06-30］. https://kns-cnki-net.webvpn.ecut.edu.cn/kcms/detail/32.1801.N.20210628.1426.002.html.
［10］	HOCHREITER S, SCHMIDHUBER J. Long short-term memory［J］. Neural Computation, 1997,9(8):1735-1780.
［11］	DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding［J］. arXiv preprint arXiv: 1810.04805, 2018.
［12］	SUN C, QIU X P, XU Y G, et al. How to fine-tune BERT for text classification?［C］// 2019 China National Conference on Chinese Computational Linguistics. 2019:194-206.
［13］	JIAO X Q, YIN Y C, SHANG L F, et al. TinyBERT: Distilling BERT for natural language understanding［J］. arXiv preprint arXiv:1909.10351, 2019.
［14］	BAHDANAU D, CHO K, BENGIO Y.Neural machine translation by jointly learning to align and translate［C］// The 3rd International Conference on Learning Representations. 2015.
［15］	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:6000-6010.
［16］	马月梅,陈海英,刘国军. 彩色图像质量评价的广义平均池化策［J］. 激光与光电子学进展, 2018,55(2):206-213.
［17］	刘国军,高丽霞,陈丽奇. 广义平均的全参考型图像质量评价池化策略［J］. 光学精密工程, 2017,25(3):742-748.
［18］	王静. 基于最大池化的图双注意力网络研究及应用［D］. 石家庄:河北师范大学, 2020.
［19］	SHU B, REN F J, BAO Y W. Investigating LSTM with K-Max pooling for text classification［C］// 2018 11th International Conference on Intelligent Computation Technology and Automation (ICICTA). 2018:31-34.
［20］	ZHOU P, QI Z Y, ZHENG S C, et al. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling［J］. arXiv preprint arXiv:1611.06639, 2016.
［21］	CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification［J］. arXiv preprint arXiv:1606.01781, 2016.
［22］	SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting［J］. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
［23］	KINGAD P,BA J. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980, 2014.
［24］	LIU Y H, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized bert pretraining approach［J］. arXiv preprint arXiv:1907.11692, 2019.
［25］	LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: A lite BERT for self-supervised learning of language representations［J］. arXiv preprint arXiv: 1909.11942, 2019.
［26］	JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT: Improving pre-training by representing and predicting spans［J］. Transactions of the Association for Computational Linguistics, 2020,8:64-77.
［27］	SUN Y, WANG S H, LI Y K, et al. ERNIE: Enhanced representation through knowledge integration［J］. arXiv preprint arXiv:1904.09223, 2019.

[1]	ZHENG Li-rui, XIAO Xiao-xia, ZOU Bei-ji, LIU Bin, ZHOU Zhan. Named Entity Recognition in Electronic Medical Record Based on BERT [J]. Computer and Modernization, 2024, 0(01): 87-91.
[2]	LIU Yu-peng, GE Yan, DU Jun-wei, CHEN Zhuo. Joint Extraction Method of Entities and Relations Based on FGM and Pointer Annotation [J]. Computer and Modernization, 2023, 0(11): 1-5.
[3]	TANG Shi-qi, ZHOU Rui-ping, XIE Shi-bin, LIU Meng-chi, XIAO Wen, . Cross-language Multi-label Sentiment Classification Based on Stacked Denoising AutoEncoder [J]. Computer and Modernization, 2023, 0(11): 6-12.
[4]	LI Shi-yue, MENG Jia-na, YU Yu-hai, LI Xue-ying, XU Ying-ao. Aspect Based Sentiment Analysis Model Based on Knowledge Enhancement [J]. Computer and Modernization, 2023, 0(10): 1-8.
[5]	WANG Hong-jie, XU Sheng-chao. Clustering Method of Cloud Platform Abnormal Transmission Data Based on Hilbert Similarity [J]. Computer and Modernization, 2023, 0(09): 27-31.
[6]	XU Ya-xin, HE Ze-en, XU Xu-kan. Automatic Classification Method of CNC Machine Tool Fault Text Based on CNN-BiLSTM [J]. Computer and Modernization, 2023, 0(04): 7-14.
[7]	XIE Shi-chao, HUANG Wei, REN Xiang-hui. A Text Entity Linking Method Based on BERT [J]. Computer and Modernization, 2023, 0(02): 58-61.
[8]	ZHU Ya-jun, Yong Tso, Nyima Tashi, . Tibetan Medical Entity Recognition Based on Tibetan BERT [J]. Computer and Modernization, 2023, 0(01): 43-48.
[9]	YU Qing, MA Zhi-long, XU Chun. Medical Knowledge Extraction Based on BERT and Non-autoregressive [J]. Computer and Modernization, 2023, 0(01): 120-126.
[10]	HUANG Zhong-xiang, LI Ming. Text Classification Based on ALBERT Combined with Bidirectional Network [J]. Computer and Modernization, 2022, 0(10): 8-12.
[11]	CHEN Gang. Government Hotline Work-order Classification Fusing RoBERTa and Feature Extraction [J]. Computer and Modernization, 2022, 0(06): 21-26.
[12]	FAN Hai-wei, QIN Jia-jie, SUN Huan, ZHANG Li-miao, LU Xin-siyu. Traffic Accident Text Information Extraction Model Based on BERT and BiGRU-CRF Fusion [J]. Computer and Modernization, 2022, 0(05): 10-15.
[13]	GUO Tian-yu, YAN Rong-guo, FANG Xu-chen, XU Yu-ling, TAO Zheng-yi. Detection of R Wave Based on Hilbert Transform and Adaptive Threshold [J]. Computer and Modernization, 2022, 0(02): 114-119.
[14]	LIU Meng-ying, WANG Yong. Microblog Hot Topic Discovery Based on Text Dual Representation Model [J]. Computer and Modernization, 2021, 0(12): 110-115.
[15]	SONG Shuang, LU Xin-da. Text Matching Model Based on BERT and Self-attention Mechanism of Image [J]. Computer and Modernization, 2021, 0(11): 12-16.