一种基于改进卷积神经网络的RGB-D室内场景分类方法

摘要/Abstract

摘要： RGB-D室内场景分类是一项极具挑战性的工作，卷积神经网络在场景分类方面已经取得了非常好的效果，但是由于室内场景存在多种目标且布局复杂，另外不同类别的场景之间存在相似性，因此传统卷积神经网络直接应用于室内场景分类存在着很多问题。针对这些问题，本文提出一种改进的基于卷积神经网络的RGB-D室内场景分类方法，包括2个分支，一个是基于ResNet-18的全局特征提取分支，另一个是深度与语义信息的融合分支。将2个分支得到的特征进行融合，达到室内场景分类的目的。在SUN RGB-D数据集上的实验结果表明，所提出的方法优于现有的对比方法。

关键词: 卷积神经网络, 场景分类, 深度学习

Abstract: RGB-D indoor scene classification is a challenging task. In this field， convolutional neural network has yielded excellent outcomes in terms of scene classification. However， many problems arise in the immediate application of traditional convolutional neural networks to indoor scene classification due to the multiple objectives， complex layout of indoor scenes， and the similarity existed between different categories of scenes. Aiming at these problems， an improved RGB-D indoor scene classification method based on convolutional neural networks is proposed， including two branches， one of which is a global feature extraction branch based on ResNet-18 and the other is a fusion branch of depth and semantic information. The features obtained from the two branches are fused for the purpose of indoor scene classification. Experimental results based on the SUN RGB-D dataset have proven the superiority of the proposed method in contrast to existing comparison methods.

Key words: convolutional neural network, scene classification, deep learning

朱原冶, 倪建军, 唐广翼. 一种基于改进卷积神经网络的RGB-D室内场景分类方法[J]. 计算机与现代化, 2023, 0(04): 73-77.

ZHU Yuan-ye, NI Jian-jun, TANG Guang-yi. An RGB-D Indoor Scene Classification Method Based on Improved Convolutional Neural Network[J]. Computer and Modernization, 2023, 0(04): 73-77.

参考文献

［1］ ZHOU B L， LAPEDRIZA A， KHOSLA A， et al. Places: A 10 million image database for scene recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018，40（6）:1452-1464.
［2］柳杨，王博文，韩建晖，等. 移动机器人室内场景主动识别的强化学习方法［J］. 河北工业大学学报， 2018，47（1）:8-18.
［3］顾广华，韩晰瑛，陈春霞，等. 图像场景语义分类研究进展综述［J］. 系统工程与电子技术， 2016，38（4）:936-948.
［4］史静，朱虹，王婧，等. 基于视觉敏感区域信息增强的室内场景分类算法［J］. 模式识别与人工智能， 2017，30（6）:520-529.
［5］ XIONG Z T， YUAN Y， WANG Q. ASK: Adaptively selecting key local features for RGB-D scene recognition［J］. IEEE Transactions on Image Processing， 2021，30:2722-2733.
［6］ NI J J， SHEN K， CHEN Y N， et al. An improved deep network-based scene classification method for self-driving cars［J］. IEEE Transactions on Instrumentation and Measurement， 2022，71. DOI: 10.1109/TIM.2022.3146923.
［7］ SOWMYA V， GOVIND D， SOMAN K P. Significance of processing chrominance information for scene classification: A review［J］. Artificial Intelligence Review， 2020，53（2）:811-842.
［8］ LOWE D G. Distinctive image features from scale-invariant keypoints［J］. International Journal of Computer Vision， 2004，60（2）:91-110.
［9］ DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition （CVPR’05）. 2005，1:886-893.
［10］ BAY H， ESS A， TUYTELAARS T， et al. Speeded-up robust features （SURF）［J］. Computer Vision and Image Understanding， 2008，110（3）:346-359.
［11］ LI L J， SU H， XING E P， et al. Object bank: A high-level image representation for scene classification & semantic feature sparsification［C］// Proceedings of the 23rd International Conference on Neural Information Processing Systems. 2010，2:1378-1386.
［12］ WALLRAVEN C， CAPUTO B， GRAF A. Recognition with local features: The kernel recipe［C］// Proceedings of the 9th IEEE International Conference on Computer Vision （ICCV 2003）. 2003:257-264.
［13］陈梦婷，陈思喜. 基于GBVS改进的Object Bank场景分类方法［J］. 计算机与现代化， 2017（1）:61-64.
［14］ KRIZHEVSKY A， SUTSKEVER I， HINTON G E. Imagenet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012:1097-1105.
［15］ CICHY R M， KHOSLA A， PANTAZIS D， et al. Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks［J］. NeuroImage， 2017，153:346-358.
［16］王盼红，朱昌明. 融合CNN与交互特征的多标签图像分类方法［J］. 计算机与现代化， 2022（9）:85-92.
［17］ LOPEZ-CIFUENTES A， ESCUDERO-VINOLO M， BESCOS J， et al. Semantic-aware scene recognition［J］. Pattern Recognition， 2020，102. DOI: 10.1016/j.patcog.2020.107256.
［18］ XU J C， XIONG Z X， BHATTACHARYYA S P. PIDNet: A real-time semantic segmentation network inspired from PID controller［J］. arXiv preprint arXiv:2206.02066， 2022.
［19］ HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［20］ SONG S R， LICHTENBERG S P， XIAO J X. SUN RGB-D: A RGB-D scene understanding benchmark suite［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:567-576.
［21］ ZHU H Y， WEIBEL J B， LU S J. Discriminative multi-modal feature fusion for RGBD indoor scene recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:2969-2976.
［22］ LI Y B， ZHANG J G， CHENG Y H， et al. DF2Net: Discriminative feature learning and fusion network for RGB-D indoor scene classification［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 2018:7041-7048.
［23］ SONG X H， JIANG S Q， HERRANZ L， et al. Learning effective RGB-D representations for scene recognition［J］. IEEE Transactions on Image Processing， 2019，28（2）:980-993.
［24］ LI Y B， ZHANG Z， CHENG Y H， et al. MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification［J］. Pattern Recognition， 2019，90:436-449.
［25］ DU D P， WANG L M， WANG H L， et al. Translate-to-recognize networks for RGB-D scene recognition［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:11836-11845.
［26］ SONG X H， JIANG S Q， WANG B H， et al. Image representations with spatial object-to-object relations for RGB-D scene recognition［J］. IEEE Transactions on Image Processing， 2020，29:525-537.
［27］ ZHENG Y， GAO X B. Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images［J］. Multimedia Tools and Applications， 2017，76（3）:4427-4443.
［28］ ZHOU B L， KHOSLA A， LAPEDRIZA A， et al. Learning deep features for discriminative localization［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:2921-2929.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[3]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[4]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[5]	陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120.
[6]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[7]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.
[8]	史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99.
[9]	陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106.
[10]	张泽1, 张建权2, 3, 周国鹏2, 3. 基于改进YOLOv8s的摄像头模组缺陷检测[J]. 计算机与现代化, 2024, 0(09): 107-113.
[11]	程亚子1, 雷亮1, 2, 陈瀚1, 赵毅然1. 基于转置注意力的多尺度深度融合单目深度估计[J]. 计算机与现代化, 2024, 0(09): 121-126.
[12]	程萌, 李浩. 改进YOLOv5s的落叶树鸟巢检测方法[J]. 计算机与现代化, 2024, 0(08): 24-29.
[13]	王梦溪, 李峻. 老年人跌倒检测技术研究综述[J]. 计算机与现代化, 2024, 0(08): 30-36.
[14]	时现伟1, 范鑫2. 基于轻量化的视频帧场景语义分割方法[J]. 计算机与现代化, 2024, 0(08): 49-53.
[15]	徐新爱, 李钢. 基于DCGAN的课堂表情图像生成方法[J]. 计算机与现代化, 2024, 0(08): 88-91.