计算机与现代化 ›› 2025, Vol. 0 ›› Issue (07): 83-89.doi: 10.3969/j.issn.1006-2475.2025.07.012

• 人工智能 • 上一篇    下一篇

基于文本引导的多模态情绪识别

  


  1. (西安工程大学计算机科学学院,陕西 西安 710600)
  • 出版日期:2025-07-22 发布日期:2025-07-22
  • 作者简介: 作者简介:翟俊龙(2000—),男,陕西宝鸡人,硕士研究生,研究方向:情绪识别,多模态,E-mail: z13379482979@163.com; 通信作者:谷林(1973—),女,山东菏泽人,副教授,博士,研究方向:图像智能化信息系统,服装数字化工程,E-mail: 396500021@qq.com。

Multi-modal Emotion Recognition Based on Text Guidance

  1. (School of  Computer Science, Xi’an Polytechnic University, Xi’an 710600, China)
  • Online:2025-07-22 Published:2025-07-22
  • About author:

摘要: 摘要:多模态情绪识别在人工智能、安全驾驶等领域已经有广泛的应用,多模态信息拥有丰富的模态表征,对情绪识别更精准。文本是表达信息较丰富准确的模态,本文提出一种由多尺度文本特征引导优化的多模态情绪识别模型,即由不同尺度的文本特征聚合优化其他模态的特征,设计由文本引导的多模态聚合模块AGG,并在损失函数的设计中引入对比学习的思想优化整个网络。各实验指标显示本文模型在多模态情绪识别上表现出色,并通过对比实验和消融实验进一步验证本文设计的合理性和有效性。

关键词: 关键词:情绪识别, 多模态, 多尺度文本特征, 多模态融合

Abstract:
Abstract: Multi-modal emotion recognition has been widely used in artificial intelligence, safe driving and other fields. Multi-modal information has rich modal representation, which is more accurate for emotion recognition. Text is a mode that expresses rich and accurate information. This paper proposes a multi-modal sentiment analysis model guided by multi-scale text features, that is, text features of different scales are aggregated to optimize the features of other modes. A text-guided multi-modal aggregation module AGG is designed, and the idea of contrast learning is introduced into the design of loss function to optimize the whole network. Each experimental index shows that the model has excellent performance in multi-modal emotion recognition, and the rationality and validity of the design are further proved by comparison experiment and ablation experiment.

Key words: Key words: emotion recognition, multi-modal, multi-scale text features, multi-modal fusion

中图分类号: