[1] GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Information Fusion, 2023,91:424-444.
[2] 宋慎铭,王琛,詹东远.突发公共卫生事件下的在线社交媒体公众情绪挖掘[J].管理评论, 2024,36(3):246-257.
[3] HONG A, LUNSCHER N, HU T H, et al. A multimodal emotional human-robot interaction architecture for social robots engaged in bidirectional communication[J]. IEEE Transactions on Cybernetics, 2021,51(12):5954-5968.
[4] ACOSTA J N, FALCONE G J, RAJPURKA P, et al. Multimodal biomedical AI[J]. Nature Medicine, 2022,28:1773
-1784.
[5] 殷梦馨,倪娜,尉怀怀,等.基于多模态情绪识别的研究进展[J]. 生物医学工程研究, 2023,42(3):285-291.
[6] DZEDZICKIS A, KAKLAUSKAS A, BUCINSKAS V. Human emotion recognition: Review of sensors and methods[J]. Sensors, 2020,20(3). DOI: 10.3390/s20030592.
[7] MIDDYA A I, NAG B, ROY S. Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities[J]. Knowledge-based Systems, 2022,244:108580.
[8] 何俊,张彩庆,李小珍,等.面向深度学习的多模态融合技术研究综述[J]. 计算机工程, 2020,46(5):1-11.
[9] MICHELSANTI D, TAN Z H, ZHANG S X, et al. An overview of deep-learning-based audio-visual speech enhancement and separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021,29(1):1368-1396.
[10] PAN Z X, LUO Z J, YANG J C, et al. Multi-modal attention for speech emotion recognition[J]. arXiv preprint arXiv : 2009.04107, 2020.
[11] 张迎辉,聂燕敏,孙波,等.基于深度森林多模态数据决策级融合抑郁症评价方法[J]. 北京师范大学学报(自然科学版), 2018,54(5):606-611.
[12] PORIA S, CAMBRIA E, BAJPAI R, et al. A review of affective computing: From unimodal analysis to multimodal fusion[J]. Information Fusion, 2017,37(C):98-125.
[13] MA Y X, HAO Y X, CHEN M. Audio-visual emotion fusion (AVEF): A deep efficient weighted approach[J]. Information Fusion, 2018,46:184-192.
[14] 仲兆满,黄贤波,熊玉龙. 基于混合融合的突发事件多模态情感分析[J]. 江苏海洋大学学报(自然科学版),2023,32(1):1-8.
[15] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal Transformer for unaligned multimodal language sequences[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, 2019:6558-6569.
[16] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: Modality-invariant and -specific representations for multimodal sentiment analysis[J]. arXiv preprint arXiv: 2005.03045, 2020.
[17] HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. ACL, 2021:9180-9192.
[18] SUN L C, LIAN Z L, LIU B, et al. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2024,15(1):309-325.
[19] LI J N, LI D X, SAVARESE S, et al. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[C]// Proceedings of the 40th International Conference on Machine Learning. ACM,2023:19730-19742.
[20] ZADEH A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 2018:2236-2246.
[21] ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint arXiv:1606.06259, 2016.
[22] CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]// European Conference on Computer Vision. Springer, 2020:213-229.
[23] CORDONNIER J B, LOUKAS A, JAGGI M. Multi-head attention: Collaborate instead of concatenate[J]. arXiv preprint arXiv:2006.16362, 2020.
[24] PRAVEEN G R, GRANGER E, CARDINAL P. Cross attentional audio-visual fusion for dimensional emotion recognition[C]// 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition. IEEE, 2021. DOI: 10.1109/FG52635.2021.9667055.
[25] DOERSCH C, GUPTA A, ZISSERMAN A. CrossTransformers: Spatially-aware few-shot transfer[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. ACM, 2020:21981-21993.
[26] 张重生,陈杰,李岐龙,等.深度对比学习综述[J].自动化学报, 2023,49(1):15-39.
[27] VAN DEN OORD A, LI Y Z, VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv preprint arXiv:1807.03748, 2018.
[28] WILLIAM I, SETIADI D R I M, RACHMAWANTO E H, et al. Face recognition using FaceNet (survey, performance test, and comparison)[C]// 2019 Fourth International Conference on Informatics and Computing. IEEE, 2019. DOI: 10.1109/ICIC47613.2019.8985786.
[29] TENNEY I, DAS D, PAVLICK E. BERT rediscovers the classical NLP pipeline[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, 2019:4593-4601.