计算机与现代化 ›› 2024, Vol. 0 ›› Issue (08): 98-107.doi: 10.3969/j.issn.1006-2475.2024.08.016
出版日期:
2024-08-28
发布日期:
2024-08-29
基金资助:
Online:
2024-08-28
Published:
2024-08-29
摘要: 近年来,基于图像的群体情绪识别受到了广泛关注,其旨在准确判断不同场景不同数量人群下群体的整体情绪状态。由于群体情绪识别涉及图像中人脸情绪特征、场景特征、人体姿态特征等多种群体情绪线索的分析和融合,使得该领域十分具有挑战性。现阶段该领域缺少相关综述性的文章对现有的研究进行整理,从而更好地进行下一步的研究。本文对该领域内不同情绪线索和不同处理方式的群体情绪识别模型进行细致梳理和分类;同时回顾并分析现有模型的处理方法和特点,整理不同融合方式的模型以及该领域的主流数据库;最后,针对该领域的发展进行简要总结和展望。
中图分类号:
高帅鹏, 王怡凡. 基于图像的群体情绪识别综述[J]. 计算机与现代化, 2024, 0(08): 98-107.
GAO Shuaipeng, WANG Yifan. Survey on Group-level Emotion Recognition in Images[J]. Computer and Modernization, 2024, 0(08): 98-107.
[1] VAN KLEEF G A, FISCHER A H. Emotional collectives: How groups shape emotions and emotions shape groups[J]. Cognition and Emotion, 2016,30(1):3-19. [2] DHALL A, KAUR A, GOECKE R, et al. EmotiW 2018: Audio -video, student engagement and group-level affect prediction[C]// Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM, 2018:653-656. [3] SEATE A A, MASTRO D. Exposure to immigration in the news: The impact of group-level emotions on intergroup behavior[J]. Communication Research, 2017,44(6):817-840. [4] GUO X, POLANIA L, ZHU B, et al. Graph neural networks for image understanding based on multiple cues: Group emotion recognition and event recognition as use cases[C]// Proceedings of the 2020 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020:2910-2919. [5] TAN L Z, ZHANG K P, WANG K, et al. Group emotion recognition with individual facial emotion CNNs and global image based CNNs[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:549-552. [6] GUPTA A, AGRAWAL D, CHAUHAN H, et al. An attention model for group-level emotion recognition[C]// Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM, 2018:611-615. [7] KHAN A S, LI Z Y, CAI J, et al. Group-level emotion recognition using deep models with a four-stream hybrid network[C]// Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM, 2018:623-629. [8] RASSADIN A, GRUZDEV A, SAVCHENKO A. Group-level emotion recognition using transfer learning from face identification[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:544-548. [9] GUO X, POLANIA L F, BARNER K E. Group-level emotion recognition using deep models on image scene, faces, and skeletons[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:603-608. [10] WANG K, ZENG X X, YANG J F, et al. Cascade attention networks for group emotion recognition with face, body and image cues[C]// Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM, 2018:640-645. [11] GUO X, ZHU B, POLANIA L F, et al. Group-level emotion recognition using hybrid deep models based on faces, scenes, skeletons and visual attentions[C]// Proceedings of the 20th ACM International Conference on Multimodal Interaction. ACM, 2018:635-639. [12] FUJII K, SUGIMURA D, HAMAMOTO T. Hierarchical group-level emotion recognition in the wild[C]// Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, 2019:275-279. [13] WANG X Z, ZHANG D, TAN H Z, et al. A self-fusion network based on contrastive learning for group emotion recognition[J]. IEEE Transactions on Computational Social Systems, 2023,10(2):458-469. [14] 季欣欣,邵洁,钱勇生. 基于注意力机制和混合网络的小群体情绪识别[J]. 计算机工程与设计, 2020,41(6):1683-1688. [15] LI J S, ROY S, FENG J S, et al. Happiness level prediction with sequential inputs via multiple regressions[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 2016:487-493. [16] CEREKOVIC A. A deep look into group happiness prediction from images[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 2016:437-444. [17] WANG Y, ZHOU S P, LIU Y Y, et al. ConGNN: Context-consistent cross-graph neural network for group emotion recognition in the wild[J]. Information Sciences, 2022,610:707-724. [18] DHALL A, GOECKE R, GEDEON T. Automatic group happiness intensity analysis[J]. IEEE Transactions on Affective Computing, 2015,6(1):13-26. [19] DHALL A, JOSHI J, SIKKA K, et al. The more the merrier: Analysing the affect of a group of people in images[C]// Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, 2015,1. DOI: 10.1109/FG.2015.7163151. [20] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014. [21] ZHANG K P, ZHANG Z P, LI Z F, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016,23(10):1499-1503. [22] VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]// Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001). IEEE, 2001,1. DOI: 10.1109/CVPR.2001.990517. [23] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, 2005,1:886-893. [24] HU P Y, RAMANAN D. Finding tiny faces[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:951-959. [25] DHALL A, GOECKE R, JOSHI J, et al. EmotiW 2016: Video and group-level emotion recognition challenges[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 2016:427-432. [26] BALAJI B, ORUGANTI V R M. Multi-level feature fusion for group-level emotion recognition[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:583-586. [27] PARKHI O, VEDALDI A, ZISSERMAN A. Deep face recognition[C]// BMVC 2015-Proceedings of the British Machine Vision Conference 2015. British Machine Vision Association, 2015. [28] LU G M, ZHANG W J. Happiness intensity estimation for a group of people in images using convolutional neural networks[C]// Proceedings of the 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE). IEEE, 2019:1707-1710. [29] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:770-778. [30] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:2261-2269. [31] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. ACM, 2012. [32] GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: A report on three machine learning contests[C]// Proceedings of the 20th International Conference on Neural Information Processing. Springer, 2013:117-124. [33] WHITEHILL J, LITTLEWORT G, FASEL I, et al. Toward practical smile detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009,31(11):2106-2111. [34] SHAMSI S N, SINGH B P, WADHWA M. Group affect prediction using multimodal distributions[C]// Proceedings of the 2018 IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE, 2018:77-83. [35] GAVRIKOV I, SAVCHENKO A V. Efficient group-based cohesion prediction in images using facial descriptors[C]// Proceedings of the 9th International Conference on Recent Trends in Analysis of Images, Social Networks and Texts. Springer International Publishing, 2021:140-148. [36] SAVCHENKO A V. Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet[J]. PeerJ Computer Science, 2019,5: e197. DOI: 10.7717/peerj-cs.197. [37] WU J X, REHG J M. CENTRIST: A visual descriptor for scene categorization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011,33(8):1489-1501. [38] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780. [39] BAY H, TUYTELAARS T, VAN GOOL L. SURF: Speeded up robust features[C]// Proceedings of the 9th European Conference on Computer Vision. Springer, 2006:404-417. [40] SURACE L, PATACCHIOLA M, BATTINI SONMEZ E, et al. Emotion recognition in the wild using deep neural networks and Bayesian classifiers[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:593-597. [41] PEARL J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference[M]. Morgan Kaufmann, 1988. [42] ZHOU B L, LAPEDRIZA A, XIAO J X, et al. Learning deep features for scene recognition using places database[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. ACM, 2014:487-495. [43] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009:248-255. [44] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning. PMLR, 2015:448-456. [45] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017:936-944. [46] SIVIC J, ZISSERMAN A. Video Google: A text retrieval approach to object matching in videos[C]// Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, 2003,2:1470-1477. [47] RAMOS J. Using TF-IDF to determine word relevance in document queries[C]// Proceedings of the 1st Instructional Conference on Machine Learning. ICML, 2003:29-48. [48] CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017:1302-1310. [49] SIMON T, JOO H, MATTHEWS I, et al. Hand keypoint detection in single images using multiview bootstrapping[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017:4645-4653. [50] WEI S E, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:4724-4732. [51] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:2818-2826. [52] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2018:7132-7141. [53] HUANG X H, DHALL A, ZHAO G Y, et al. Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis[C]// Proceedings of the British Machine Vision Conference 2015. BMVA, 2015. DOI: 10.5244/C.29.34. [54] WEI Q L, ZHAO Y J, XU Q H, et al. A new deep-learning framework for group emotion recognition[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:587-592. [55] VONIKAKIS V, YAZICI Y, NGUYEN V D, et al. Group happiness assessment using geometric features and dataset balancing[C]// Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM, 2016:479-486. [56] FELZENSZWALB P F, MCALLESTER D, RAMANAN D. A discriminatively trained, multiscale, deformable part model[C]// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008. DOI: 10.1109/CVPR.2008.4587597. [57] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,32(9):1627-1645. [58] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D. Cascade object detection with deformable part models[C]// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010:2241-2248. [59] ABBAS A, CHALUP S K. Group emotion recognition in the wild by combining deep neural networks for facial expression classification and scene-context analysis[C]// Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 2017:561-568. [60] LIU N J, FANG Y C, GUO Y K. Enhancing feature correlation for bi-modal group emotion recognition[C]// Proceedings of the 19th Pacific-Rim Conference on Multimedia: Advances in Multimedia Information Processing–PCM 2018. Springer, 2018:24-34. [61] CAO P, DAI Y, LIU S J, et al. Group emotion recognition based on multilayer hybrid network[C]// Proceedings of the 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC). IEEE, 2018:173-177. [62] GHOSH S, DHALL A, SEBE N. Automatic group affect analysis in images via visual attribute and feature networks[C]// Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018:1967-1971. [63] LI D J, LUO R M, SUN S Q. Group-level emotion recognition based on faces, scenes, skeletons features[C]// Proceedings of the 11th International Conference on Graphics and Image Processing (ICGIP 2019). SPIE, 2020:46-51. [64] KHAN A S, LI Z Y, CAI J, et al. Regional attention networks with context-aware fusion for group emotion recognition[C]// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. IEEE, 2021:1149-1158. [65] ZHU Q, MAO Q R, ZHANG J L, et al. Towards a robust group-level emotion recognition via uncertainty-aware learning[J]. arXiv preprint arXiv:2310.04306, 2023. [66] WANG X Z, ZHANG D, LEE D J. Implementing the affective mechanism for group emotion recognition with a new graph convolutional network architecture[J]. IEEE Transactions on Affective Computing, 2023. DOI: 10.1109/TAFFC.2023.3320101. [67] XIE H X, LEE M X, CHEN T J, et al. Most important person-guided dual-branch cross-patch attention for group affect recognition[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. IEEE, 2023:20541-20551. [68] LE N, NGUYEN K, NGUYEN A, et al. Global-local attention for emotion recognition[J]. Neural Computing and Applications, 2022,34(24):21625-21639. [69] FAN S J, SHEN Z Q, JIANG M, et al. Emotional attention: A study of image sentiment and visual attention[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:7521-7531. [70] 谢锦清. 基于人脸表情识别的人群情绪感知模型研究[D]. 北京:北京交通大学, 2022. [71] ZHANG H, ZU K K, LU J, et al. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network[C]// Proceedings of the 16th Asian Conference on Computer Vision. Springer, 2022:541-557. [72] BARSADE S G, GIBSON D E. Group affect: Its influence on individual and group outcomes[J]. Current Directions in Psychological Science, 2012,21(2):119-123. [73] GORI M, MONFARDINI G, SCARSELLI F. A new model for learning in graph domains[C]// Proceedings of the 2005 IEEE International Joint Conference on Neural Networks. IEEE, 2005,2:729-734. [74] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2009,20(1):61-80. [75] PERRONNIN F, SANCHEZ J, MENSINK T. Improving the fisher kernel for large-scale image classification[C]// Proceedings of the 2010 European Conference on Computer Vision. Springer, 2010:143-156. [76] JEGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]// Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010:3304-3311. [77] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017:1800-1807. [78] YANG D K, HUANG S, WANG S L, et al. Emotion recognition for multiple context awareness[C]// Proceedings of the 2022 European Conference on Computer Vision. Springer, 2022:144-162. [79] THUSEETHAN S, RAJASEGARAR S, YEARWOOD J. EmoSeC: Emotion recognition from scene context[J]. Neurocomputing, 2022,492:174-187. [80] CAO Q, SHEN L, XIE W D, et al. VGGFace2: A dataset for recognising faces across pose and age[C]// Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 2018:67-74. [81] LI J, WANG X P, LV G Q, et al. GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation[J]. Neurocomputing, 2023,550. DOI: 10.1016/j.neucom.2023.126427. [82] JOSHI A, BHAT A, JAIN A, et al. COGMEN: Contextualized GNN based multimodal emotion recognition[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2022:4148-4164. [83] HUANG C Q, ZHANG J L, WU X M, et al. TeFNA: Text-centered fusion network with crossmodal attention for multimodal sentiment analysis[J]. Knowledge-Based Systems, 2023,269. DOI: 10.1016/j.knosys.2023.110502. [84] 张昱. 面向文本、音频、视频的多模态情感分析[D]. 乌鲁木齐:新疆师范大学, 2022. [85] LEI J, LI L J, ZHOU L W, et al. Less is more: Clipbert for video-and-language learning via sparse sampling[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2021:7327-7337. |
[1] | 何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9. |
[2] | 黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39. |
[3] | 张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58. |
[4] | 刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77. |
[5] | 祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112. |
[6] | 陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120. |
[7] | 杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26. |
[8] | 王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79. |
[9] | 史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99. |
[10] | 陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106. |
[11] | 张泽1, 张建权2, 3, 周国鹏2, 3. 基于改进YOLOv8s的摄像头模组缺陷检测[J]. 计算机与现代化, 2024, 0(09): 107-113. |
[12] | 程亚子1, 雷亮1, 2, 陈瀚1, 赵毅然1. 基于转置注意力的多尺度深度融合单目深度估计[J]. 计算机与现代化, 2024, 0(09): 121-126. |
[13] | 程萌, 李浩. 改进YOLOv5s的落叶树鸟巢检测方法[J]. 计算机与现代化, 2024, 0(08): 24-29. |
[14] | 王梦溪, 李峻. 老年人跌倒检测技术研究综述[J]. 计算机与现代化, 2024, 0(08): 30-36. |
[15] | 时现伟1, 范鑫2. 基于轻量化的视频帧场景语义分割方法[J]. 计算机与现代化, 2024, 0(08): 49-53. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||