Improved SOD Algorithm with Cross Modal Interaction and Multi Scale Aggregation

doi:10.3969/j.issn.1006-2475.2025.11.009

Abstract

Abstract: Abstract: Salient object detection （SOD） is an important research direction in the field of computer vision， which aims to identify and segment the most noteworthy objects in a scene. It is difficult for the single-modal SOD algorithm to achieve effective detection results after the image information is disturbed by illumination and out-of-focus， while the multi modal detection algorithm has the problems of large difference in feature information， low effectiveness of cross-modal feature fusion， and low feature utilization rate between different levels. In order to solve the above problems， this paper proposes an improved SOD algorithm based on cross modal interaction and multi scale aggregation. The algorithm adopts a dual-loop cross-modal interaction mechanism to fuse RGB image features and thermal infrared image features in a cooperative incentive learning manner， and the information receptive field amplification mechanism is used to fuse spatial and channel information of different dimensions between the two modal information at the same level. The multi-scale aggregation mechanism mines the features of different depths of the network model， transmits and connects， aggregates the shallow fine grained information and the deep coarse grained abstract information， and finally obtains the object detection results. ResNet， VGGNet and DenseNet are used for feature extraction， and the detection effects are compared through experiments. Experiments on a variety of targets in outdoor scenes are carried out to verify the algorithm and qualitative and quantitative analysis， and the results show that our algorithm achieves good detection accuracy and detection effect， and the overall performance is better than that of the existing SOD model.

Key words: Key words: cross modal interaction, multi scale aggregation, salient object detection, cooperative incentive learning, deep learning

CLC Number:

中图分类号：TP391

WANG Jingpeng, CUI Yuyong, CAI Changlin, HE Ming’ao, LI Yinghao, TANG Zhonghe. Improved SOD Algorithm with Cross Modal Interaction and Multi Scale Aggregation[J]. Computer and Modernization, 2025, 0(11): 71-79.

References

［1］闫河，沈绍兰，刘灵坤. 结合多层次监督与边界损失的显著性目标检测［J］. 计算机仿真， 2024，41（6）: 293-298.
［2］徐玉菁，李洪鹏. 基于特征残差融合的显著性检测网络［J］. 计算机应用与软件， 2024，41（5）:166-196.
［3］杨爱萍，王子麒，程思萌，等. 基于分层解码和渐进融合的快速显著性目标检测［J］. 天津大学学报， 2024，57（7）:721-728.
［4］夏晨星，陈欣雨，孙延光，等. 集成多种上下文与混合交互的显著性目标检测［J］. 电子与信息学报， 2024，46（7）:2918-2931.
［5］ XIA X F ，MA Y D . Cross-stage feature fusion and efficient self-attention for salient object detection［J］. Journal of Visual Communication and Image Representation，2024，104:104271.
［6］ TKACZYK R， MADEJSKI G， GRADOLEWSKI D， et al. Methodological selection of optimal features for object classification based on stereovision system［J］. Sensors，2024，24（12）:3941-3941.
［7］ DONG X P， SHEN J B， WANG W G， et al. Dynamical hyperparameter optimization via deep reinforcement learning in tracking［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021，43（5） :1515-1529．
［8］ LAI S M， LIU C， WANG D， et al. Refocus the attention for parameter-efficient thermal infrared object tracking［J］. IEEE Transactions on Neural Networks and Learning Systems， 2024，36（5）:9538-9549.
［9］ YANG X C， HUANG X L， HUANG Z Q， et al. A dynamic target tracking model for uavs based on the fusion of twin networks and deep learning［J］. Journal of Physics: Conference Series， 2024，2807（1）:012030.
［10］ LU X K， WANG W G， MA C， et al. See more， know more: Unsupervised video object segmentation with co-attention Siamese networks［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）.IEEE，2019. DOI: 10.1109/CVPR.2019.00374.
［11］ LAI B S， GONG X J. Saliency guided dictionary learning for weakly-supervised image parsing［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2016:3630-3639.
［12］陈福康.视觉显著性启发的视频目标分割方法研究［D］. 南京:南京理工大学， 2021.
［13］ RAFI T H， MAHJABIN R， GHOSH E， et al. Domain generalization for semantic segmentation: A survey［J］. Artificial Intelligence Review， 2024，57（9）:247.
［14］白雪飞，卢立彬，王文剑.显著性引导的目标互补隐藏弱监督语义分割［J］. 中国图象图形学报，2024，29（4）:1041-1055.
［15］蒋亭亭，刘昱，马欣，等.多支路协同的RGB-T图像显著性目标检测［J］.中国图象图形学报，2021，26（10）: 2388-2399.
［16］ ZHANG L H， ZHANG D D， SUN J Y， et al. Salient object detection by local and global manifold regularized SVM model［J］. Neurocomputing， 2019，340:42-54.
［17］ TU Z Z， MA Y， LI Z， et al. RGBT salient object detection: A large-scale dataset and benchmark［J］. IEEE Transactions on Multimedia， 2022（25）:4163-4176.
［18］ ZHOU W J， ZHU Y， LEI J S，et al. LSNet: Lightweight spatial boosting network for detecting salient objects in RGB-thermal images［J］. IEEE Trans Imageaction on Processing，2023，32：1329-1340.
［19］ WANG H， SONG K C， HUANG L M， et al.Thermal images-aware guided early fusion network for cross-illumination RGB-T salient object detection［J］. Engineering Applications of Artificial Intelligence， 2023，118:105640.
［20］ LIU N， ZHANG N， HAN J W. Learning selective self-mutual attention for RGB-D saliency detection［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE， 2020:13753-13762.
［21］ JU R， GE L， GENG W J， et al. Depth saliency based on anisotropic center-surround difference［C］// 2014 IEEE International Conference on Image Processing （ICIP）. IEEE，2014:1115-1119.
［22］ HU J， SHEN L， SUN G， et al. Squeeze-and-excitation networks［C］// 2018 IEEE Conference on Computer Vision and Pattern Recognition（CAPR）.IEEE，2018.DOI: 10.1109/CVPR.2018.00745.
［23］ HOU Q B， ZHOU D Q， FENG J S. Coordinate attention for efficient mobile network design［C］// 2021 IEEE Conference on Computer Vision and Pattern Recognition（CAPR）.IEEE， 2021. DOI: 10.1109/CVPR46437.2021.01350.
［24］ HE K， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition（CAPR）. IEEE，2016. DOI: 10.1109/CVPR.2016.90.
［25］ SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556， 2014．
［26］ HUANG G， LIU Z， MAATEN L V D， et al. Densely connected convolutional networks［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition（CAPR）.IEEE， 2017. DOI: 10.1109/CVPR.2017.243.
［27］ WOO S， PARK J， LEE J Y， et al. CBAM: Convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision （ECCV）. Springer， 2018:3-19.
［28］ WANG G Z， LI C L， MA Y P， et al. RGB-T saliency detection benchmark: Dataset， baselines， analysis and a novel approach［C］// Proceedings of the 13th Conference on Image and Graphics Technologies and Applications. Springer， 2018:359-369.
［29］ TU Z Z， XIA T， LI C L， et al. RGB-T image saliency detection via collaborative graph learning［J］. IEEE Transactions on Multimedia.IEEE， 2020，22（1）:160-173.
［30］ PIAO Y R， JI W， LI J J， et al. Depth-induced multi-scale recurrent attention network for saliency detection［C］// Proceedings of 2019 IEEE International Conference on Computer Vision. IEEE， 2019，10（27）:7254-7263.
［31］ TU Z Z， LI Z， LI C L， et al. Multi-interactive dual-decoder for RGB-thermal salient object detection［J］. IEEE Transactions on Image Processing，2021，30:5678-5691.
［32］ TU Z Z， XIA T， LI C L， et al. M3S-NIR: Multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection［C］// Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval （MIPR）. IEEE， 2019. DOI: 10.1109/MIPR.2019.00032.
［33］ LIU J J， HOU Q B， CHENG M M， et al. A simple pooling-based design for real-time salient object detection［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2019: 3917-3926.
［34］ DENG Z J， HU X W， ZHU L， et al. R3Net: Recurrent residual refinement network for saliency detection［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. ACM， 2018: 684-690.
［35］ WU Z， SU L， HUANH Q M. Cascaded partial decoder for fast and accurate salient object detection［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2019: 3907-3916.
［36］ QIN X B，ZHANG Z C，HUANG C Y， et al. BASNet:Boundary-aware salient object detection［C］// Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2019:7479-7489.
［37］ ZHAO J X， LIU J J， FAN D P， et al. EGNet:Edge guidance network for salient object detection［C］// Proceedings of 2019 IEEE International Conference on Computer Vision. IEEE， 2019: 8779-8788.

[1]	ZHANG Yue, GUO Zixin, HUANG Yibin, YAN Tao. Network Intrusion Detection Method Based on Convolutional Neural Networks with convLSTM [J]. Computer and Modernization, 2025, 0(03): 119-126.
[2]	LUO Hao, LI Xianfeng. Remote Sensing Image Classification Based on Multi-scale Feature Extraction [J]. Computer and Modernization, 2025, 0(03): 86-92.
[3]	PU Yaya, WANG Yanbo, SU Yongdong, XU Zhongcheng. Multi-scale Feature Image Defogging Algorithm Based on Content-guided Attention Fusion [J]. Computer and Modernization, 2025, 0(03): 78-85.
[4]	ZHAO Yin, YIN Siqing, ZHANG Yonglai. Improved Traffic Sign Detection Algorithm of YOLOv7 [J]. Computer and Modernization, 2025, 0(02): 94-99.
[5]	LIU Chongyi, LI Hua, REN Dejun, LIU Yaokai, WANG Yulong. Anomaly Detection Algorithm Based on Bidirectional Multi-scale Knowledge Distillation [J]. Computer and Modernization, 2025, 0(02): 58-63.
[6]	XIAO Junbi, FU Tianqi. Real-Time Traffic Classification Method Based on High-dimensional Feature#br# Dimensionality Reduction and Clustering [J]. Computer and Modernization, 2025, 0(01): 80-85.
[7]	WANG Mengxi, LI Jun. Review of Fall Detection Technologies for Elderly [J]. Computer and Modernization, 2024, 0(08): 30-36.
[8]	HUANG Wendong, WANG Yifan. Survey on Multimodal Information Processing and Fusion Based on Modal Categories [J]. Computer and Modernization, 2024, 0(07): 47-62.
[9]	ZHANG Ke1, AI Zhongliang2, LIU Zhonglin3, GU Pingli1, LIU Xuelin4. Judicial Argumentation Understanding Method Based on Multiplet Loss [J]. Computer and Modernization, 2024, 0(06): 115-120.
[10]	LIN Wei. Incremental News Recommendation Method Based on Self-supervised Learning and Data Replay [J]. Computer and Modernization, 2023, 0(12): 1-6.
[11]	ZHOU Xuan, ZHU Su-lei, HE Wei. Impact Point Detecting Algorithm Based on Salient Object Detection [J]. Computer and Modernization, 2022, 0(01): 54-60.
[12]	WEI Ji-peng, QIN Guo-feng. RGB-D Salient Object Detection Based on Depth Image Gain [J]. Computer and Modernization, 2021, 0(05): 26-30.