计算机与现代化 ›› 2025, Vol. 0 ›› Issue (11): 71-79.doi: 10.3969/j.issn.1006-2475.2025.11.009

• 图像处理 • 上一篇    下一篇

一种改进的跨模态交互和多尺度聚合的SOD算法

  


  1. (西南技术物理研究所,四川 成都 610095)
  • 出版日期:2025-11-20 发布日期:2025-11-24
  • 作者简介: 作者简介:王晶鹏(1997—),男,河北邯郸人,工程师,硕士,研究方向:图像处理,人工智能,E-mail: 2267728437@qq.com; 崔雨勇(1983—),男,山西临汾人,研究员,博士,研究方向:图像处理,E-mail: charleycui@qq.com; 蔡长霖(1997—),男,黑龙江鹤岗人,工程师,硕士,研究方向: 信号信息处理,E-mail: nuc15050143@163.com; 赫明骜(1996—),女,黑龙江大庆人,工程师,硕士,研究方向:光学信息处理,E-mail: 1249505683@qq.com; 李颖浩(1994—),男,四川内江人,工程师,硕士,研究方向:图像和信号处理,E-mail: lee18383218831@163.com; 唐中和(1987—),男,四川广安人,高级工程师,硕士,研究方向:图像处理。
  • 基金资助:
    基金项目:国防科技重点实验室基金资助项目(6142401200302); 西南技术物理研究所青年创新基金资助项目(k230044-015)
      

Improved SOD Algorithm with Cross Modal Interaction and Multi Scale Aggregation

  1. (Southwest Institute of Technical Physics, Chengdu 610095, China)
  • Online:2025-11-20 Published:2025-11-24

摘要: 摘要:显著性目标检测(Salient Object Detection, SOD)是计算机视觉领域的一个重要研究方向,它旨在对场景中最能引起关注的目标进行识别和分割。单模态的SOD算法在图像信息受到光照、失焦等干扰后难以达到有效的检测结果,而多模态的检测算法存在特征信息差异大,跨模态特征融合有效性低,不同层级之间特征利用率低的问题。针对以上问题,本文提出一种改进的跨模态交互和多尺度聚合的SOD算法。该算法采用双循环的跨模态交互机制,在RGB图像特征与热红外图像特征之间以协作激励学习的方式融合,此外在同一层级的2种模态信息上采用信息感受野放大机制,融合不同维度的空间和通道信息。多尺度聚合机制挖掘网络模型不同深度的特征,进行传递和连接,将浅层的细粒度信息和深层粗粒度抽象信息聚合,最后得到目标检测结果。分别采用ResNet、VGGNet和DenseNet进行特征提取,通过实验对比三者的检测效果。在室外场景下的多种目标进行实验对该算法进行验证和定性、定量分析,结果表明该算法取得了较好的检测精度和检测效果,并且整体性能优于现有的SOD模型。

关键词: 关键词:跨模态交互, 多尺度聚合, 显著性目标检测, 协作激励学习, 深度学习

Abstract: Abstract: Salient object detection (SOD) is an important research direction in the field of computer vision, which aims to identify and segment the most noteworthy objects in a scene. It is difficult for the single-modal SOD algorithm to achieve effective detection results after the image information is disturbed by illumination and out-of-focus, while the multi modal detection algorithm has the problems of large difference in feature information, low effectiveness of cross-modal feature fusion, and low feature utilization rate between different levels. In order to solve the above problems, this paper proposes an improved SOD algorithm based on cross modal interaction and multi scale aggregation. The algorithm adopts a dual-loop cross-modal interaction mechanism to fuse RGB image features and thermal infrared image features in a cooperative incentive learning manner, and the information receptive field amplification mechanism is used to fuse spatial and channel information of different dimensions between the two modal information at the same level. The multi-scale aggregation mechanism mines the features of different depths of the network model, transmits and connects, aggregates the shallow fine grained information and the deep coarse grained abstract information, and finally obtains the object detection results. ResNet, VGGNet and DenseNet are used for feature extraction, and the detection effects are compared through experiments. Experiments on a variety of targets in outdoor scenes are carried out to verify the algorithm and qualitative and quantitative analysis, and the results show that our algorithm achieves good detection accuracy and detection effect, and the overall performance is better than that of the existing SOD model.

Key words: Key words: cross modal interaction, multi scale aggregation, salient object detection, cooperative incentive learning, deep learning

中图分类号: