Computer and Modernization ›› 2024, Vol. 0 ›› Issue (07): 47-62.doi: 10.3969/j.issn.1006-2475.2024.07.008

Previous Articles     Next Articles

Survey on Multimodal Information Processing and Fusion Based on Modal Categories

  

  1. (College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China)
  • Online:2024-07-25 Published:2024-08-08

Abstract:  With the continuous advancement of artificial intelligence and deep learning technologies, research in the field of multimodal information processing and fusion has garnered widespread attention from researchers. This paper provides a comprehensive overview of the development history and milestone works of multimodal information processing, along with strategies and models for multimodal fusion. Based on different modalities,mainstream datasets for multimodal information processing and fusion are systematically classified and summarized. Using modality type as the classification criterion, this paper systematically reviews the research progress in multimodal information processing and fusion, emphasizing the distinctions between different modalities. Multimodal information processing and fusion are categorized into four types: audio-visual processing and fusion, audio-text processing and fusion, visual-text processing and fusion, and visual-audio-text processing and fusion. Detailed investigations are conducted on methods and models for processing and fusing different input modalities. Finally, a summary and outlook on the development of multimodal processing and fusion are provided.

Key words:  , multimodal processing; multimodal information processing; multimodal fusion; deep learning

CLC Number: