Computer and Modernization ›› 2022, Vol. 0 ›› Issue (04): 103-109.

Previous Articles     Next Articles

Audio-visual Eye Fixation Prediction Based on Audio-visual Consistency

  

  1. (College of Computer Science and Technology, Qingdao University, Qingdao 266071, China)
  • Online:2022-05-07 Published:2022-05-07

Abstract: The existing audio-visual human eye fixation prediction algorithms use the double-stream structure to extract the features of audio-visual information respectively, and then fuse the audio-visual features to get the final prediction map. However, the audio information and visual information may not be correlated in the datasets. Therefore, when the audio and visual features are inconsistent, the direct fusion of audio and visual features will have a negative impact on the visual features. In view of the above-mentioned problems, this paper proposes an audio-visual consistency network (AVCN) for eye fixation prediction based on audio-visual consistency. In order to verify the reliability of the network, this paper adds an audio-visual consistency network to the existing audio-visual consistency human eye fixation detection model. AVCN carries out the consistency binary judgment on the extracted audio and video features. When the two are consistent, the audio-visual fusion features will be output as the final prediction map; otherwise, the visual dominant features will be output as the final result. The method is tested on six publicly available datasets, and the results show that the proposed AVCN model has better performance.

Key words: computer version, eye fixation prediction, audio-visual consistency