基于分割的任意形状场景文本实时检测

doi:10.3969/j.issn.1006-2475.2023.11.015

摘要/Abstract

摘要： 摘要：当前场景文本检测技术面临的挑战主要体现在2个方面：模型实时性和准确性之间的权衡，以及任意形状文本的检测。它们决定了场景文本检测在真实场景中应用是否可行。针对以上2个问题，本文采用基于分割的方法，提出一种轻量且特征提取能力强的主干网络，可以实时准确地检测任意形状的自然场景文本。具体来说，使用了结构简单的双分辨率残差主干网络和低计算成本的深度聚合金字塔池化模块，将二者提取到的特征融合使用可微二值化模块进行分割。通过在标准英文数据集ICDAR2015上进行的对比实验表明，本文提出的改进方法有效，且在实时性和准确性上都达到可比较的结果。

关键词: 关键词：实时文本检测, 双分辨率主干, 语义分割, 深度聚合金字塔池化模块

Abstract: Abstract：The current challenges of scene text detection technology are mainly reflected in two aspects: the trade-off between model real-time performance and accuracy， and the detection of arbitrary shape text. They determine whether scene text detection is feasible in real scenes. Aiming at the above two problems， this paper proposes a lightweight backbone network with strong feature extraction ability based on segmentation method， which can accurately detect natural scene text of arbitrary shape in real time. Specifically， a simple dual-resolution residual backbone network and a deep aggregate pyramid pooling module with low computational cost are used， and the features extracted from them are fused and segmented using a differentiable binarization module. Through the comparative experiment on the standard English dataset ICDAR2015， the result show that the improved method proposed in this paper is effective， and achieves comparable results in real-time performance and accuracy.

Key words: Key words： real-time text detection, dual resolution backbone, semantic segmentation, deep aggregation pyramid pooling module

中图分类号:

TP391.1

许鸿奎, 李振业, 郭文涛, 赵京政, 郭旭斌. 基于分割的任意形状场景文本实时检测[J]. 计算机与现代化, 2023, 0(11): 95-100.

XU Hong-kui, LI Zhen-ye, GUO Wen-tao, ZHAO Jing-zheng, GUO Xu-bin. Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation[J]. Computer and Modernization, 2023, 0(11): 95-100.

参考文献

［1］ ZHU Y， YAO C， BAI X. Scene text detection and recognition: Recent advances and future trends［J］. Frontiers of Computer Science， 2016，10（1）:19-36.
［2］ KRIZHEVSKY A， SUTSKEVER I， HINTON G E. Imagenet classification with deep convolutional neural networks［J］. Communications of the ACM， 2017，60（6）:84-90.
［3］ GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
［4］ LIAO M H， SHI B G， BAI X， et al. Textboxes: A fast text detector with a single deep neural network［C］// The 31st AAAI Conference on Artificial Intelligence. 2016. DOI:10.1609/aaai.v31i1.11196.
［5］ LIU W， ANGUELOV D， ERHAN D， et al. SSD: Single shot multibox detector［C］// European Conference on Computer Vision. Springer. 2016:21-37.
［6］ ZHOU X Y， YAO C， WEN H， et al. East: An efficient and accurate scene text detector［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2642-2651.
［7］ LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3431-3440.
［8］ NEUBECK A， VAN GOOL L. Efficient non-maximum suppression［C］// The 18th International Conference on Pattern Recognition （ICPR’06）. IEEE， 2006，3:850-855.
［9］ LUO W J， LI Y J， URTASUN R， et al. Understanding the effective receptive field in deep convolutional neural networks［J］. arXiv preprint arXiv:1701.04128， 2017.
［10］ WANG Y X， XIE H T， ZHA Z J， et al. R-Net: A relationship network for efficient and accurate scene text detection［J］. IEEE Transactions on Multimedia， 2020，23:1316-1329.
［11］ SHI B G， BAI X， BELONGIE S. Detecting oriented text in natural images by linking segments［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2550-2558.
［12］ WANG P Q， CHEN P F， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// 2018 IEEE Winter Conference on Applications of Computer Vision （WACV）. IEEE， 2018:1451-1460.
［13］ LONG S B， RUAN J Q， ZHANG W J， et al. Textsnake: A flexible representation for detecting text of arbitrary shapes［C］// Proceedings of the European Conference on Computer Vision （ECCV）. 2018:20-36.
［14］ LI X， WANG W H， HOU W B， et al. Shape robust text detection with progressive scale expansion network［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:9336-9345.
［15］ WANG W H， XIE E， SONG X， et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network［C］// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:8440-8449.
［16］ LIAO M H， WAN Y， YAO C， et al. Real-time scene text detection with differentiable binarization［C］// Proceedings of the AAAI Conference on Artificial Intelligence. 2020，34（7）:11474-11481.
［17］ ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
［18］ HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015，37（9）:1904-1916.
［19］ REDMON J， FARHADI A. YOLOv3: An incremental improvement［J］. arXiv preprint arXiv:1804.02767， 2018.
［20］ HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［21］ DAI J F， QI H Z， XIONG Y W， et al. Deformable convolutional networks［C］// Proceedings of the IEEE International Conference on Computer Vision. 2017:764-773.
［22］ WANG J D， SUN K， CHENG T H， et al. Deep high-resolution representation learning for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020，43（10）:3349-3364.
［23］ ZHAO H S， SHI J P， QI X J， et al. Pyramid scene parsing network［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
［24］蔡鑫鑫，王敏. 基于分割的任意形状场景文本检测［J］. 计算机系统应用， 2020，29（12）:257-262.
［25］ VATTI B R. A generic solution to polygon clipping［J］. Communications of the ACM， 1992，35（7）:56-63.
［26］ RUBY U， YENDAPALLI V. Binary cross entropy with deep learning technique for image classification［J］. International Journal of Advanced Trends in Computer Science and Engineering， 2020. DOI: 10.30534/ijatcse/2020/175942020.
［27］ SHRIVASTAVA A， GUPTA A， GIRSHICK R. Training region-based object detectors with online hard example mining［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:761-769.
［28］ KINGMA D P， BA J. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980， 2014.
［29］ LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2117-2125.

[1]	周安达, 唐超颖. 雨天道路场景语义分割算法及其移动端部署[J]. 计算机与现代化, 2024, 0(10): 7-13.
[2]	乔佳, 徐琨, 胡佩蓉. 多尺度特征融合的版面分析方法[J]. 计算机与现代化, 2024, 0(05): 16-21.
[3]	崔少国, 胡光平. 基于语义分割的嵌套命名实体识别方法[J]. 计算机与现代化, 2024, 0(02): 69-74.
[4]	胡崇佳, 刘金洲, 方立. 基于无监督域适应的室外点云语义分割[J]. 计算机与现代化, 2024, 0(01): 74-79.
[5]	叶思佳, 魏延, 杜韩宇, 邓金枝. 结合注意力机制的HRNet图像语义分割算法[J]. 计算机与现代化, 2023, 0(10): 65-69.
[6]	刘续, 查可可. 一种用于机场特种车辆作业的环境目标识别方法[J]. 计算机与现代化, 2023, 0(08): 18-24.
[7]	牛玉珩, 李永可, 陈燕红, 蒋平安. 基于改进SegFormer模型的棉田地表残膜图像分割方法[J]. 计算机与现代化, 2023, 0(07): 93-98.
[8]	叶力鸣, 陈蔚文. 一种结合语义分割和目标检测的级联式绝缘子缺陷检测方法[J]. 计算机与现代化, 2023, 0(06): 82-88.
[9]	唐舒放, 王志胜. 基于双注意力机制的街景语义分割[J]. 计算机与现代化, 2021, 0(10): 69-74.
[10]	周贤来. 基于语义分割的异构多核平台大数据挖掘算法[J]. 计算机与现代化, 2020, 0(10): 40-43.
[11]	朱大庆, 曹国. 基于全卷积网络的砂石图像粒径检测[J]. 计算机与现代化, 2020, 0(07): 111-116.
[12]	周晨轶，王文，卢杉，徐亦白. 基于多层信息融合的实时语义分割及其在电力场景中的应用[J]. 计算机与现代化, 2019, 0(08): 17-.
[13]	王文，徐亦白，卢杉，冯宇. 一种结合区域检测和语义分割的SLAM技术[J]. 计算机与现代化, 2019, 0(07): 55-.
[14]	杨志尧1,2，彭召意1,2，文志强1,2. 一种基于区域建议网络的图像语义分割方法[J]. 计算机与现代化, 2018, 0(02): 122-.