Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation

doi:10.3969/j.issn.1006-2475.2023.11.015

Abstract

Abstract: Abstract：The current challenges of scene text detection technology are mainly reflected in two aspects: the trade-off between model real-time performance and accuracy， and the detection of arbitrary shape text. They determine whether scene text detection is feasible in real scenes. Aiming at the above two problems， this paper proposes a lightweight backbone network with strong feature extraction ability based on segmentation method， which can accurately detect natural scene text of arbitrary shape in real time. Specifically， a simple dual-resolution residual backbone network and a deep aggregate pyramid pooling module with low computational cost are used， and the features extracted from them are fused and segmented using a differentiable binarization module. Through the comparative experiment on the standard English dataset ICDAR2015， the result show that the improved method proposed in this paper is effective， and achieves comparable results in real-time performance and accuracy.

Key words: Key words： real-time text detection, dual resolution backbone, semantic segmentation, deep aggregation pyramid pooling module

CLC Number:

TP391.1

XU Hong-kui, LI Zhen-ye, GUO Wen-tao, ZHAO Jing-zheng, GUO Xu-bin. Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation[J]. Computer and Modernization, 2023, 0(11): 95-100.

References

［1］ ZHU Y， YAO C， BAI X. Scene text detection and recognition: Recent advances and future trends［J］. Frontiers of Computer Science， 2016，10（1）:19-36.
［2］ KRIZHEVSKY A， SUTSKEVER I， HINTON G E. Imagenet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017，60（6）:84-90.
［3］ GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
［4］ LIAO M H， SHI B G， BAI X， et al. Textboxes: A fast text detector with a single deep neural network［C］// The 31st AAAI Conference on Artificial Intelligence. 2016. DOI:10.1609/aaai.v31i1.11196.
［5］ LIU W， ANGUELOV D， ERHAN D， et al. SSD: Single shot multibox detector［C］// European Conference on Computer Vision. Springer. 2016:21-37.
［6］ ZHOU X Y， YAO C， WEN H， et al. East: An efficient and accurate scene text detector［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2642-2651.
［7］ LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3431-3440.
［8］ NEUBECK A， VAN GOOL L. Efficient non-maximum suppression［C］// The 18th International Conference on Pattern Recognition （ICPR’06）. IEEE， 2006，3:850-855.
［9］ LUO W J， LI Y J， URTASUN R， et al. Understanding the effective receptive field in deep convolutional neural networks［J］. arXiv preprint arXiv:1701.04128， 2017.
［10］ WANG Y X， XIE H T， ZHA Z J， et al. R-Net: A relationship network for efficient and accurate scene text detection［J］. IEEE Transactions on Multimedia， 2020，23:1316-1329.
［11］ SHI B G， BAI X， BELONGIE S. Detecting oriented text in natural images by linking segments［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2550-2558.
［12］ WANG P Q， CHEN P F， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// 2018 IEEE Winter Conference on Applications of Computer Vision （WACV）. IEEE， 2018:1451-1460.
［13］ LONG S B， RUAN J Q， ZHANG W J， et al. Textsnake: A flexible representation for detecting text of arbitrary shapes［C］// Proceedings of the European Conference on Computer Vision （ECCV）. 2018:20-36.
［14］ LI X， WANG W H， HOU W B， et al. Shape robust text detection with progressive scale expansion network［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:9336-9345.
［15］ WANG W H， XIE E， SONG X， et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network［C］// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:8440-8449.
［16］ LIAO M H， WAN Y， YAO C， et al. Real-time scene text detection with differentiable binarization［C］// Proceedings of the AAAI Conference on Artificial Intelligence. 2020，34（7）:11474-11481.
［17］ ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
［18］ HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015，37（9）:1904-1916.
［19］ REDMON J， FARHADI A. YOLOv3: An incremental improvement［J］. arXiv preprint arXiv:1804.02767， 2018.
［20］ HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［21］ DAI J F， QI H Z， XIONG Y W， et al. Deformable convolutional networks［C］// Proceedings of the IEEE International Conference on Computer Vision. 2017:764-773.
［22］ WANG J D， SUN K， CHENG T H， et al. Deep high-resolution representation learning for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020，43（10）:3349-3364.
［23］ ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
［24］蔡鑫鑫，王敏. 基于分割的任意形状场景文本检测［J］. 计算机系统应用， 2020，29（12）:257-262.
［25］ VATTI B R. A generic solution to polygon clipping［J］. Communications of the ACM， 1992，35（7）:56-63.
［26］ RUBY U， YENDAPALLI V. Binary cross entropy with deep learning technique for image classification［J］. International Journal of Advanced Trends in Computer Science and Engineering， 2020. DOI: 10.30534/ijatcse/2020/175942020.
［27］ SHRIVASTAVA A， GUPTA A， GIRSHICK R. Training region-based object detectors with online hard example mining［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:761-769.
［28］ KINGMA D P， BA J. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980， 2014.
［29］ LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2117-2125.

[1]	ZHOU Anda, TANG Chaoying. Semantic Segmentation Algorithm for Rainy Road Scene and Its Mobile Deployment [J]. Computer and Modernization, 2024, 0(10): 7-13.
[2]	QIAO Jia, XU Kun, HU Peirong. Layout Analysis Method of Multi-scale Feature Fusion [J]. Computer and Modernization, 2024, 0(05): 16-21.
[3]	CUI Shao-guo, HU Guang-ping. Nested Named Entity Recognition Based on Semantic Segmentation [J]. Computer and Modernization, 2024, 0(02): 69-74.
[4]	HU Chong-jia, LIU Jin-zhou, FANG Li. Unsupervised Domain Adaptation for Outdoor Point Cloud Semantic Segmentation [J]. Computer and Modernization, 2024, 0(01): 74-79.
[5]	YE Si-jia, WEI Yan, DU Han-yu, DENG Jin-zhi. HRNet Image Semantic Segmentation Algorithm Combined with Attention Mechanism [J]. Computer and Modernization, 2023, 0(10): 65-69.
[6]	LIU Xu, ZHA Ke-ke. An Environmental Target Recognition Method for Airport Special Vehicle Operation [J]. Computer and Modernization, 2023, 0(08): 18-24.
[7]	NIU Yu-heng, LI Yong-ke, CHEN Yan-hong, JANG Ping-an. Image Segmentation Method of Residual Film on Cotton Field Surface Based on Improved SegFormer Model#br# [J]. Computer and Modernization, 2023, 0(07): 93-98.
[8]	YE Li-ming, CHEN Wei-wen. A Cascaded Insulator Defect Detection Model Combining Semantic Segmentation and Object Detection [J]. Computer and Modernization, 2023, 0(06): 82-88.
[9]	TANG Shu-fang, WANG Zhi-sheng. Semantic Segmentation of Street Scenes Based on Double Attention Mechanism [J]. Computer and Modernization, 2021, 0(10): 69-74.
[10]	ZHOU Xian-lai. Big Data Mining Algorithm of Heterogeneous Multi-core Platform Based on Semantic Segmentation [J]. Computer and Modernization, 2020, 0(10): 40-43.
[11]	ZHU Da-qing, CAO Guo. Particle Size Detection of Sandstone Images Based on Full Convolutional Network [J]. Computer and Modernization, 2020, 0(07): 111-116.
[12]	ZHOU Chen-yi, WANG Wen, LU Shan， XU Yi-bai. Real-time Semantic Segmentation Based on Multi-scale Fusion #br# and Its Application in Electric Power Scene [J]. Computer and Modernization, 2019, 0(08): 17-.
[13]	WANG Wen, XU Yi-bai, LU Shan, FENG Yu. A SLAM Technology Combining Area Detection and Semantic Segmentation [J]. Computer and Modernization, 2019, 0(07): 55-.
[14]	YANG Zhi-yao1,2, PENG Zhao-yi1,2, WEN Zhi-qiang1,2. Image Semantic Segmentation Based on Region Proposal Network [J]. Computer and Modernization, 2018, 0(02): 122-.