拥抱融合的多模态灾害分析算法

摘要/Abstract

摘要： 融合文本和图像的多模态信息相对于单模态可以提升灾害事件分析准确率。但是已有的工作多数将文本特征和图片特征进行简单的融合，在提取、融合特征的时候造成特征的冗余，同时忽略了模态之间的联系，没有考虑到图像和文本之间特征的相关性。为此，本文分析和研究目前流行的多模态融合算法，提出一种拥抱融合的多模态灾害事件分析算法。首先将文本特征和图像的特征向量互相对比，考虑文本和图像特征之间的相关性。然后基于多项抽样，剔除冗余的特征，融合文本特征和图像特征。实验结果表明，拥抱融合在CrisisMMD2.0数据集上实验1的2个任务的分类效果准确率分别高达88.2%、85.1%，都明显优于其他多模态融合模型，表明了该模型的有效性。同时第2个实验也验证了拥抱模型对于不同文本和图像深度学习模型的适用性。

关键词: 多模态融合, 拥抱融合, 多项抽样, 多模态灾害事件, CrisisMMD2.0

Abstract: The multi-modal information fusion of texts and images can improve the accuracy of disaster event analysis compared with single-modality. However, most of the existing works simply merge the text features and image features, resulting in feature redundancy when extracting and fusing features, while ignoring the relationship between modalities, and the correlation of features between images and texts is not considered. To this end, this article analyzes and studies the current popular multi-modal fusion algorithms, and proposes a multi-modal disaster event analysis algorithm based on embrace fusion. First, the feature vectors of texts and those of images are compared with each other, and the correlation between text features and image features is considered. Then, based on multinomial sampling, the redundancy of features is eliminated, and text features and image features are fused. The experimental results show that the classification accuracy rates of the two tasks of Embrace Fusion on the CrisisMMD2.0 dataset are as high as 88.2% and 85.1%, respectively, which are significantly better than other multimodal fusion models, proving the effectiveness of the model. At the same time, the second experiment also verifies the applicability of the hug model to different text and image deep learning models.

Key words: multi-modal fusion, embrace fusion, multinomial sampling, multi-modal disaster event, CrisisMMD2.0

梅欣, 缪梓敬. 拥抱融合的多模态灾害分析算法[J]. 计算机与现代化, 2022, 0(10): 82-87.

MEI Xin, MIAO Zi-jing. Multi-modal Disaster Analysis Based on Embracing Fusion[J]. Computer and Modernization, 2022, 0(10): 82-87.

参考文献

［1］邬柯杰,吴吉东,叶梦琪. 社交媒体数据在自然灾害应急管理中的应用研究综述［J］. 地理科学进展, 2020,39(8):1412-1422.
［2］ IMRAN M, CASTILLO C, DIAZ F, et al. Processing social media messages in mass emergency: A survey［J］. ACM Computing Surveys, 2015,47(4). DOI: 10.1145/2771588.
［3］ DALY S, THOM J A. Mining and classifying image posts on social media to analyse fires［C］// Proceedings of the ISCRAM 2016 Conference. 2016.
［4］ HUANG Y, DU C Z, XUE Z H, et al. What makes multi-modal learning better than single (provably)［J］. arXiv preprint arXiv:2106.04538, 2021.
［5］ BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: A survey and taxonomy［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,41(2):423-443.
［6］ ATREY P K, HOSSAIN M A, EL SADDIK A, et al. Multimodal fusion for multimedia analysis: A survey［J］. Multimedia Systems, 2010,16(6):345-379.
［7］ SNOEK C G M, WORRING M, SMEULDERS A W M. Early versus late fusion in semantic video analysis［C］// Proceedings of the 13th Annual ACM International Conference on Multimedia. 2005:399-402.
［8］ HALL D L, LLINAS J. An introduction to multisensor data fusion［J］. Proceedings of the IEEE, 1997,85(1):6-23.
［9］ NOJAVANASGHARI B, GOPINATH D, KOUSHIK J, et al. Deep multimodal fusion for persuasiveness prediction［C］// Proceedings of the 18th ACM International Conference on Multimodal Interaction. 2016:284-288.
［10］OFLI F, ALAM F, IMRAN M. Analysis of social media data using multimodal deep learning for disaster response［J］. arXiv preprint arXiv:2004.11838, 2020.
［11］PEREZ-RUA J M, VIELZEUF V, PATEUX S, et al. MFAS: Multimodal fusion architecture search［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:6959-6968.
［12］XU N, MAO W J, CHEN G D. Multi-interactive memory network for aspect based multimodal sentiment analysis［C］// Proceedings of the 2019 AAAI Conference on Artificial Intelligence. 2019,33(1):371-378.
［13］ABAVISANI M, WU L W, HU S L, et al. Multimodal categorization of crisis events in social media［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:14667-14677.
［14］LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1449-1457.
［15］NGUYEN D, NGUYEN K, SRIDHARAN S, et al. Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition［J］. Computer Vision and Image Understanding, 2018,174:33-42.
［16］CHOI J H, LEE J S. EmbraceNet: A robust deep learning architecture for multimodal classification［J］. Information Fusion, 2019,51:259-270.
［17］MERITY S, KESKAR N S, SOCHER R. Regularizing and optimizing LSTM language models［J］. arXiv preprint arXiv:1708.02182, 2017.
［18］IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift［C］// Proceedings of the 32nd International Conference on Machine Learning. 2015:448-456.
［19］SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting［J］. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
［20］WAN L, ZEILER M, ZHANG S X, et al. Regularization of neural networks using dropconnect［C］// Proceedings of the 30th International Conference on Machine Learning. 2013,3:1058-1066.
［21］LECUN Y, BENGIO Y, HINTON G. Deep learning［J］. Nature, 2015,521(7553):436-444.
［22］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［23］ALAM F, OFLI F, IMRAN M. CrisisMMD: Multimodal Twitter datasets from natural disasters［C］// Proceedings of the 12th International AAAI Conference on Web and Social Media. 2018:465-473.
［24］HOWARD J, GUGGER S. Fastai: A layered API for deep learning［J］. Information, 2020,11(2). DOI: 10.3390/info11020108.
［25］DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization［J］. The Journal of Machine Learning Research, 2011,12:2121-2159.

[1]	黄文栋, 王怡凡. 基于模态类别的多模态信息处理与融合综述[J]. 计算机与现代化, 2024, 0(07): 47-62.
[2]	杨娟, 滕飞, 郭大林. 多模态融合的特征提取方法在SA检测中的应用[J]. 计算机与现代化, 2022, 0(10): 121-126.