计算机与现代化 ›› 2024, Vol. 0 ›› Issue (10): 113-119.doi: 10.3969/j.issn.1006-2475.2024.10.018

• 多媒体技术 • 上一篇    下一篇

跨模态注意力融合和信息感知的情感一致检测


  

  1. (1.国能数智科技开发(北京)有限公司,北京 100011; 2.合肥工业大学管理学院,安徽 合肥 230009)
  • 出版日期:2024-10-29 发布日期:2024-10-30
  • 基金资助:
    国家自然科学基金资助项目(71671057)

Sentiment Consistency Detection Based on Cross Modal Attention Fusion and#br# Information Perception

  1. (1. Guoneng Digital Intelligence Technology Development Co., Ltd., Beijing 100011, China;
    2. School of Management, Hefei University of Technology, Hefei 230009, China)
  • Online:2024-10-29 Published:2024-10-30

摘要: 随着信息技术的迅猛发展,海量的图像和文本等数据通过各种渠道不断产生和传播,对图文等多模态数据进行的识别和检测技术在电商、医疗、物流、金融和建筑等许多领域应用广泛。情感一致检测旨在探索如何准确地判断不同模态数据表达的情感是否一致。现有的大多数情感一致检测模型通常采用隐性融合的方式,并未显式地将情感在模态之间进行对齐,且忽略了情感词在检测中的重要作用。为此,本文提出一种跨模态注意力融合和信息感知的情感一致检测模型,利用基于BERT的双通道模块捕捉图像和文本模态间的动态交互,引入外部知识来增强文本表示,将图像和文本根据情感信息有效聚合,构建共同注意力矩阵,捕捉文本句子与文本标签之间、文本句子与文本标签的情感向量之间的不协调特征,提高图文情感一致检测的准确性。基于X(原Twitter)的公共多模态数据集的实验结果验证了该模型的优越性。

关键词: 多模态, 图文情感一致检测, 注意力机制, 知识增强

Abstract: With the rapid development of information technology, massive amounts of image and text information are constantly generated and disseminated through various channels. The recognition and detection technology for multimodal data is widely used in many fields such as e-commerce, healthcare, logistics, finance, and construction. Sentiment consistency detection aims to explore how to accurately determine whether sentiments expressed in different modal data are consistent. Most existing sentiment consistency detection models usually adopt implicit fusion, without explicitly aligning sentiments between modalities, and ignoring the important role of sentiment words in detection. Therefore, a model is proposed based on cross-modality attention fusion and information perception for sentiment consistency detection. The model utilizes a dual channel module based on BERT to capture the dynamic interaction between image and text modalities, introduces external knowledge to enhance text representation, aggregates image and text based on sentiment information, builds a common attention matrix to capture the uncoordinated features between text sentences and text labels, as well as between the sentiment vectors of text sentences and text labels, and improves the accuracy of sentiment consistency detection between image and text. The experimental results on a public multi-modal dataset based on X(former Twitter)demonstrates the superiority of the proposed model.

Key words: multimodality, image and text sentiment consistency detection, attention mechanism, knowledge enhancement

中图分类号: