Computer and Modernization ›› 2025, Vol. 0 ›› Issue (11): 71-79.doi: 10.3969/j.issn.1006-2475.2025.11.009

Previous Articles     Next Articles

Improved SOD Algorithm with Cross Modal Interaction and Multi Scale Aggregation

  

  1. (Southwest Institute of Technical Physics, Chengdu 610095, China)
  • Online:2025-11-20 Published:2025-11-24

Abstract: Abstract: Salient object detection (SOD) is an important research direction in the field of computer vision, which aims to identify and segment the most noteworthy objects in a scene. It is difficult for the single-modal SOD algorithm to achieve effective detection results after the image information is disturbed by illumination and out-of-focus, while the multi modal detection algorithm has the problems of large difference in feature information, low effectiveness of cross-modal feature fusion, and low feature utilization rate between different levels. In order to solve the above problems, this paper proposes an improved SOD algorithm based on cross modal interaction and multi scale aggregation. The algorithm adopts a dual-loop cross-modal interaction mechanism to fuse RGB image features and thermal infrared image features in a cooperative incentive learning manner, and the information receptive field amplification mechanism is used to fuse spatial and channel information of different dimensions between the two modal information at the same level. The multi-scale aggregation mechanism mines the features of different depths of the network model, transmits and connects, aggregates the shallow fine grained information and the deep coarse grained abstract information, and finally obtains the object detection results. ResNet, VGGNet and DenseNet are used for feature extraction, and the detection effects are compared through experiments. Experiments on a variety of targets in outdoor scenes are carried out to verify the algorithm and qualitative and quantitative analysis, and the results show that our algorithm achieves good detection accuracy and detection effect, and the overall performance is better than that of the existing SOD model.

Key words: Key words: cross modal interaction, multi scale aggregation, salient object detection, cooperative incentive learning, deep learning

CLC Number: