Real-time Semantic Segmentation Based on Multi-scale Fusion #br# and Its Application in Electric Power Scene

doi:10.3969/j.issn.1006-2475.2019.08.004

Abstract

Abstract: Semantic segmentation is a basic work in computer vision. In this paper, a new upsampling structure combined point-wise convolution with dilation convolution is proposed and a real-time semantic segmentation model is designed based on this structure. The model can reach 72.1% mIoU and 125 fps running speed with the input of 640×360 on Cityscapes data set and has also good performance on a electric power scene data set. In addition, the paper transplants the model to the mobile terminal and implements an augmented reality application of electric power scene based on semantic segmentation.

Key words: deep learning, semantic segmentation, convolutional neural networks, electric power scene

CLC Number:

TP391

ZHOU Chen-yi, WANG Wen, LU Shan， XU Yi-bai. Real-time Semantic Segmentation Based on Multi-scale Fusion #br# and Its Application in Electric Power Scene[J]. Computer and Modernization, 2019, 0(08): 17-.

References

［1］ SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. Computer Vision and Pottern Recognition, 2014, arXiv:1409.1556.
［2］ HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition［C］// Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［3］ EIGEN D, FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture［C］// IEEE International Conference on Computer Vision. 2015:2650-2658.
［4］ LIU F, SHEN C, LIN G. Deep convolutional neural fields for depth estimation from a single image［C］// 2015 IEEE International Conference on Computer Vision and Pattern Recognition. 2015:5162-5170.
［5］ LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation［J］. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014,39(4):640-651.
［6］ NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation［C］// Proceedings of International Conference on Computer Vision. 2015:1520-1528.
［7］ RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation［C］// International Conference on Medical Image Computing & Computer-assisted Intervention. 2015:234-241.
［8］ YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions［C］// Proceedings of International Conference on Learning Representations. 2015.
［9］ YU F, KOLTUN V, FUNKHOUSER T. Dilated residual networks［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:636-644.
［10］ROMERA E, ALVAREZ J M, BERGASA L M, et al. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation［J］. IEEE Transactions on Intelligent Transportation Systems, 2018,19(1):263-272.
［11］BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for scene segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(12):2481-2495.
［12］HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications［J］. Computer Vision and Pattern Recognition, 2017, arXiv:1704.04861.
［13］CHOLLET F. Xception: Deep learning with depthwise separable convolutions［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:1251-1258.
［14］WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation［C］// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018:1451-1460.
［15］ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
［16］ABADI M, BARHAM P, CHEN J, et al. Tensorflow: A system for large-scale machine learning［C］// USENIX Operating System Design and Implementation. 2016,16:265-283.
［17］KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks［C］// Advances in Neural Information Processing Systems. 2012:1097-1105.
［18］CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding［C］// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016:3213-3223.
［19］PASZKE A, CHAURASIA A, KIM S, et al. Enet: A deep neural network architecture for real-time semantic segmentation［J］. Computer Vision and Pattern Recognition, 2016, arXiv:1606.02147.
［20］TREML M, ARJONA-MEDINA J, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving［C］// Neural Information Processing Systems Workshop. 2016:1-5.
［21］MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation［J］. Computer Vision and Pattern Recognition, 2018, arXiv:1803.06815.
［22］WANG W, PAN Z. DSNet for real-time driving scene semantic segmentation［J］. Computer Vision and Pattern Recognition, 2018, arXiv:1812.07049.
［23］ZHAO H, QI X, SHEN X, et al. Icnet for real-time semantic segmentation on high-resolution images［C］// Proceedings of the European Conference on Computer Vision. 2018:405-420.

[1]	LIU Baobao, YANG Jingjing, TAO Lu, WANG Heying . DSMSC Based on Attention Mechanism for Remote Sensing Image Scene Classification [J]. Computer and Modernization, 2024, 0(12): 72-77.
[2]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[3]	CHEN Kai1, LI Yiting1, 2, QUAN Huafeng1. A River Discarded Bottles Detection Method Based on Improved YOLOv8 [J]. Computer and Modernization, 2024, 0(11): 113-120.
[4]	ZHOU Anda, TANG Chaoying. Semantic Segmentation Algorithm for Rainy Road Scene and Its Mobile Deployment [J]. Computer and Modernization, 2024, 0(10): 7-13.
[5]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[6]	WANG Yingying, HAO Xiao. Fine-grained Image Classification Based on Res2Net and Recursive Gated Convolution [J]. Computer and Modernization, 2024, 0(10): 74-79.
[7]	SHI Xingyu1, LI Qiang2, ZHUANG Li3, LIANG Yi3, WANG Qiulin3, CHEN Kai3, WU Chenzhou3, CHANG Sheng1. Object Detection Models Distillation Technique for Industrial Deployment [J]. Computer and Modernization, 2024, 0(10): 93-99.
[8]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[9]	CHENG Yazi1, LEI Liang1, 2, CHEN Han1, ZHAO Yiran1. Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention [J]. Computer and Modernization, 2024, 0(09): 121-126.
[10]	CHENG Meng, LI Hao. Improved Deciduous Tree Nest Detection Method Based on YOLOv5s [J]. Computer and Modernization, 2024, 0(08): 24-29.
[11]	WANG Mengxi, LI Jun. Review of Fall Detection Technologies for Elderly [J]. Computer and Modernization, 2024, 0(08): 30-36.
[12]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.
[13]	XU Xin’ai, LI Gang. An Image Generation Method of Classroom Expression Images [J]. Computer and Modernization, 2024, 0(08): 88-91.
[14]	GAO Shuaipeng, WANG Yifan. Survey on Group-level Emotion Recognition in Images [J]. Computer and Modernization, 2024, 0(08): 98-107.
[15]	HUANG Wendong, WANG Yifan. Survey on Multimodal Information Processing and Fusion Based on Modal Categories [J]. Computer and Modernization, 2024, 0(07): 47-62.