计算机与现代化 ›› 2022, Vol. 0 ›› Issue (10): 82-87.

• 图像处理 • 上一篇    下一篇

拥抱融合的多模态灾害分析算法

  

  1. (华南师范大学计算机学院,广东广州510631)
  • 出版日期:2022-10-20 发布日期:2022-10-21
  • 作者简介:梅欣(1996—),男,江西抚州人,硕士研究生,研究方向:多模态学习,E-mail:mx13767661058@163.com;缪梓敬(1997—),男,广东汕尾人,硕士研究生,研究方向:自然语言处理,E-mail:1040058330@qq.com。
  • 基金资助:
    广东省重点领域研发计划项目(2019B111101001)

Multi-modal Disaster Analysis Based on Embracing Fusion

  1. (School of Computer Science, South China Normal University, Guangzhou 510631, China)
  • Online:2022-10-20 Published:2022-10-21

摘要: 融合文本和图像的多模态信息相对于单模态可以提升灾害事件分析准确率。但是已有的工作多数将文本特征和图片特征进行简单的融合,在提取、融合特征的时候造成特征的冗余,同时忽略了模态之间的联系,没有考虑到图像和文本之间特征的相关性。为此,本文分析和研究目前流行的多模态融合算法,提出一种拥抱融合的多模态灾害事件分析算法。首先将文本特征和图像的特征向量互相对比,考虑文本和图像特征之间的相关性。然后基于多项抽样,剔除冗余的特征,融合文本特征和图像特征。实验结果表明,拥抱融合在CrisisMMD2.0数据集上实验1的2个任务的分类效果准确率分别高达88.2%、85.1%,都明显优于其他多模态融合模型,表明了该模型的有效性。同时第2个实验也验证了拥抱模型对于不同文本和图像深度学习模型的适用性。

关键词: 多模态融合, 拥抱融合, 多项抽样, 多模态灾害事件, CrisisMMD2.0

Abstract: The multi-modal information fusion of texts and images can improve the accuracy of disaster event analysis compared with single-modality. However, most of the existing works simply merge the text features and image features, resulting in feature redundancy when extracting and fusing features, while ignoring the relationship between modalities, and the correlation of features between images and texts is not considered. To this end, this article analyzes and studies the current popular multi-modal fusion algorithms, and proposes a multi-modal disaster event analysis algorithm based on embrace fusion. First, the feature vectors of texts and those of images are compared with each other, and the correlation between text features and image features is considered. Then, based on multinomial sampling, the redundancy of features is eliminated, and text features and image features are fused. The experimental results show that the classification accuracy rates of the two tasks of Embrace Fusion on the CrisisMMD2.0 dataset are as high as 88.2% and 85.1%, respectively, which are significantly better than other multimodal fusion models, proving the effectiveness of the model. At the same time, the second experiment also verifies the applicability of the hug model to different text and image deep learning models.

Key words: multi-modal fusion, embrace fusion, multinomial sampling, multi-modal disaster event, CrisisMMD2.0