基于栈式降噪编码器的跨语言多标签情感分类

doi:10.3969/j.issn.1006-2475.2023.11.002

计算机与现代化 ›› 2023, Vol. 0 ›› Issue (11): 6-12.doi: 10.3969/j.issn.1006-2475.2023.11.002

基于栈式降噪编码器的跨语言多标签情感分类

（1.广州大数据智能教育重点实验室，广东广州 510631； 2.华南师范大学计算机学院，广东广州 510631）

出版日期:2023-11-29 发布日期:2023-11-29
作者简介:唐诗琪（1998—），女，广东湛江人，硕士研究生，研究方向：自然语言处理，情感分类，E-mail: 2532855590@qq.com；周瑞平（1998—），女，四川广安人，硕士研究生，研究方向：数据库技术，E-mail: 1658330923@qq.com；谢仕斌（1997—），男，广东汕头人,硕士研究生，研究方向：教育大数据，知识追踪，E-mail: 784995152@qq.com；刘梦赤（1962—），男，教授，研究方向：大数据系统，智能信息系统，E-mail: liumengchi@scnu.edu.cn；肖文（1998—），女，广东惠州人，硕士研究生，研究方向：自然语言处理，E-mail: 2532855590@qq.com。
基金资助:
国家自然科学基金资助项目（61672389）；广州市大数据智能教育重点实验室项目（201905010009）

Cross-language Multi-label Sentiment Classification Based on Stacked Denoising AutoEncoder

（1. Guangzhou Key Laboratory of Big Data and Intelligent Education， Guangzhou 510631， China；
2. School of Computer Science， South China Normal University， Guangzhou 510631， China）

Online:2023-11-29 Published:2023-11-29

摘要/Abstract

摘要： 摘要：多标签情感分类任务旨在处理一个实例可能与多个情感标签关联的问题。现有的大多数多标签情感分类模型都是基于完整的数据设计，模型性能和语义易受到数据本身存在的不完全性影响。针对此问题本文提出一种基于栈式降噪自编码器的跨语言多标签情感分类模型，引入标签感知损失函数弥补训练带来的损失。该模型通过栈式降噪自编码器对词向量去噪以构建原始数据的低维特征，降低特征空间的噪声干扰，为下游任务提供有效特征表示。在SemEval2018的3种语言数据集（即英语、阿拉伯语和西班牙语）多标签情感分类实验中，该模型在测试集上的micro_F1、macro_F1、jaccard这3个指标均得到提升，其中macro_F1分别提升了约0.82、1.45和1.83个百分点。

关键词: 关键词：多标签分类, 情感分类, 不完全数据, BERT, 栈式降噪自编码器

Abstract: Abstract: The multi-label sentiment classification task aims to deal with the problem that an instance may be associated with multiple sentiment labels. Most existing multi-label sentiment classification models were designed based on complete data，and their performance and sentiment were easily affected by the incompleteness of data itself. To address this problem，a cross-language multi-label sentiment classification model based on stacked denoising autoencoder is proposed， and a loss function is introduced to compensate for the loss caused by training. In this model， the word vectors are denoised by the stacked denoising autoencoder to construct the low-dimensional features of the original data. This reduces the noise interference in feature space and provides effective feature representation for downstream tasks. In the multi-label sentiment classification experiment of SemEval2018 three language datasets （English， Arabic and Spanish）， the micro_F1 score， macro_F1 score and jaccard indexes of the model on the test set are all improved. Macro_F1 is improved by about 0.82， 1.45 and 1.83 percentage points， respectively.

Key words: Key words: multi-label classification, sentiment classification, incomplete data, BERT, stacked denoising autoencoder（SDAE）

中图分类号:

TP391

唐诗琪, 周瑞平, 谢仕斌, 刘梦赤, 肖文, . 基于栈式降噪编码器的跨语言多标签情感分类[J]. 计算机与现代化, 2023, 0(11): 6-12.

TANG Shi-qi, ZHOU Rui-ping, XIE Shi-bin, LIU Meng-chi, XIAO Wen, . Cross-language Multi-label Sentiment Classification Based on Stacked Denoising AutoEncoder[J]. Computer and Modernization, 2023, 0(11): 6-12.

参考文献

［1］ SCHAPIRE R E，SINGER Y. Improved boosting algorithms using confidence-rated predictions［M］// Machine Learning. Kluwer Academic Publishers， 1999,37:297-336.
［2］ HE H H，XIA R. Joint binary neural network for multi-label learning with applications to emotion classification［C］// CCF International Conference on Natural Language Processing and Chinese Computing（NLPCC）. 2018:250-259.
［3］ CAMRAS L. Emotion: A psychoevolutionary synthesis by Robert Plutchik［J］. The American Journal of Psychology，1980,93（4）:751-753.
［4］ BAZIOTIS C，NIKOLAOS A，CHRONOPOULOU A，et al. NTUA-SLP at SemEval-2018 task 1: Predicting affective content in tweets with deep attentive RNNs and transfer learning［C］// Proceedings of the 12th International Workshop on Semantic Evaluation. 2018:245-255.
［5］ FEI H，ZHANG Y，REN Y F，et al. Latent emotion memory for multi-label emotion classification［C］// Proceedings of the AAAI Conference on Artificial Intelligence. 2020:7692-7699.
［6］ ALHUZALI H，ANANIADOU S. SpanEmo: Casting multi-label emotion classification as span-prediction［C］// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2021:1573-1584.
［7］ DEVLIN J，CHANG M，LEE K，et al. BERT：Pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of NAACL-HLT. 2019：4171-4186.
［8］ YEH C K，WU W C，KO W J，et al. Learning deep latent space for multi-label classification［C］// Proceedings of the AAAI Conference on Artificial Intelligence. 2017,31. DOI:10.1609/aaai.v31i1.10769.
［9］ PANKO R R. Thinking is bad：Implications of human error research for spreadsheet research and practice［C］// Proceedings of European Spreadsheet Risks Interest Group. 2007:69-80.
［10］ DERIU J，LUCCHI A，DE LUCA V，et al. Leveraging large amounts of weakly supervised data for multi-language sentiment classification［C］// Proceedings of the 26th International Conference on World Wide Web. 2017:1045-1052.
［11］ VINCENT P，LAROCHELLE H，LAJOIE I，et al. Stacking denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion［J］. Journal of Machine Learning Research，2010，110（11）：3371-3408.
［12］ SERGIO G C，LEE M. Stacked DeBERT: All attention in incomplete data for text classification［J］. Neural Networks. 2021，136:87-96.
［13］罗俊,陈黎飞. 基于BERT的不完全数据情感分类［J］. 计算机应用，2021,41（1）:139-144.
［14］ VINCENT P，LAROCHELLE H，BENGIO Y，et al. Extracting and composing robust features with denoising autoencoders［C］// Proceedings of the 25th International Conference on Machine learning. 2008:1096-1103.
［15］ MIKOLOV T，CHEN K，CORRADO G，et al. Efficient estimation of word representations in vector space［J］. arXiv preprint arXiv:1301.3781，2013.
［16］ PETERS M E，NEUMANN M，LAYYER M，et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies. 2018:2227-2237.
［17］ BAZIOTIS C，PELEKIS N，DOULKERIDIS C. DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis［C］// Proceedings of the 11th International Workshop on Semantic Evaluation （SemEval-2017）. 2017:747-754.
［18］ MOHAMMAD S，BRAVO-MARQUEZ F，SALAMEH M，et al. SemEval-2018 task 1: Affect in tweets［C］// Proceedings of the 12th International Workshop on Semantic Evaluation. 2018. DOI: 10.18653/v1/S18-1001.
［19］ YU J F，MARUJO L，JIANG J，et al. Improving multi-label emotion classification via sentiment classification with dual attention transfer network［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018:1097-1102.
［20］ ZHOU D Y，YANG Y，HE Y L. Relevant emotion ranking from text constrained with emotion relationships［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018:561-571.
［21］ VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Advances in Neural Information Processing Systems. 2011:5998-6008.
［22］ YING W H，XIANG R，LU Q. Improving multi-label emotion classification by integrating both general and domain-specific knowledge［C］// Proceedings of the 5th Workshop on Noisy User-generated Text （W-NUT）. 2019:316-321.
［23］ XU P，LIU Z H，WINATA G I，et al. Emograph: Capturing emotion correlations using graph networks［J］. arXiv preprint arXiv:2008.09378，2020.
［24］ BADARO G，EL JUNDI O，KHADDAJ A，et al. EMA at semeval-2018 task 1: Emotion mining for Arabic［C］// Proceedings of the 12th International Workshop on Semantic Evaluation. 2018:236-244.
［25］ MULKI H，ALI C B，HADDAD H，et al. Tw-StAR at semeval-2018 task 1: Preprocessing impact on multi-label emotion classification［C］// Proceedings of the 12th International Workshop on Semantic Evaluation. 2018:167-171.
［26］ ALSWAIDAN N，MENAI M E B. Hybrid feature model for emotion recognition in arabic text［J］. IEEE Access. 2020,8:37843-37854.
［27］ GONZALEZ J A，HURTADO L F，PLA F. ELiRF-UPV at semeval-2018 tasks 1 and 3: Affect and irony detection in tweets［C］// Proceedings of the 12th International Workshop on Semantic Evaluation. 2018:565-569.

[1]	郑久超, 赵新元. 基于主题与描述信息的实体链接方法[J]. 计算机与现代化, 2024, 0(12): 10-14.
[2]	马钰, 杨勇, 任鸽, 帕力旦·吐尔逊. 基于GCN和微调BERT的作文自动评分方法[J]. 计算机与现代化, 2024, 0(09): 33-37.
[3]	赵盾1, 佘学兵2, 邬昌兴3. 基于BERT-BiLSTM-CRF党建领域命名实体识别[J]. 计算机与现代化, 2024, 0(09): 91-94.
[4]	王谭, 陈金广, 马丽丽. 融合词典信息和句子语义的中文命名实体识别[J]. 计算机与现代化, 2024, 0(03): 24-28.
[5]	郑立瑞, 肖晓霞, 邹北骥, 刘彬, 周展. 基于BERT的电子病历命名实体识别[J]. 计算机与现代化, 2024, 0(01): 87-91.
[6]	刘玉鹏, 葛艳, 杜军威, 陈卓. 融合FGM和指针标注的实体关系联合抽取方法[J]. 计算机与现代化, 2023, 0(11): 1-5.
[7]	李诗月, 孟佳娜, 于玉海, 李雪莹, 许英傲. 基于知识增强的方面级情感分析方法[J]. 计算机与现代化, 2023, 0(10): 1-8.
[8]	谢世超, 黄蔚, 任祥辉. 一种基于BERT的文本实体链接方法[J]. 计算机与现代化, 2023, 0(02): 58-61.
[9]	朱亚军, 拥措, 尼玛扎西, . 基于藏文BERT的藏医药医学实体识别[J]. 计算机与现代化, 2023, 0(01): 43-48.
[10]	于清, 马志龙, 徐春. 基于BERT和非自回归的医疗知识抽取[J]. 计算机与现代化, 2023, 0(01): 120-126.
[11]	黄忠祥, 李明. ALBERT结合双向网络的文本分类[J]. 计算机与现代化, 2022, 0(10): 8-12.
[12]	陈钢. 融合RoBERTa和特征提取的政务热线工单分类[J]. 计算机与现代化, 2022, 0(06): 21-26.
[13]	张军, 邱龙龙. 一种基于BERT和池化操作的文本分类模型[J]. 计算机与现代化, 2022, 0(06): 1-7.
[14]	樊海玮, 秦佳杰, 孙欢, 张丽苗, 鲁芯丝雨. 基于BERT与BiGRU-CRF的交通事故文本信息提取模型[J]. 计算机与现代化, 2022, 0(05): 10-15.
[15]	刘梦颖, 王勇. 基于文本双表示模型的微博热点话题发现[J]. 计算机与现代化, 2021, 0(12): 110-115.

基于栈式降噪编码器的跨语言多标签情感分类

Cross-language Multi-label Sentiment Classification Based on Stacked Denoising AutoEncoder

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价