Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation
(1. School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China; 2. Shandong Key Laboratory of Intelligent Buildings Technology, Jinan 250101, China)
XU Hong-kui, LI Zhen-ye, GUO Wen-tao, ZHAO Jing-zheng, GUO Xu-bin. Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation[J]. Computer and Modernization, 2023, 0(11): 95-100.
[1] ZHU Y, YAO C, BAI X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016,10(1):19-36.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017,60(6):84-90.
[3] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
[4] LIAO M H, SHI B G, BAI X, et al. Textboxes: A fast text detector with a single deep neural network[C]// The 31st AAAI Conference on Artificial Intelligence. 2016. DOI:10.1609/aaai.v31i1.11196.
[5] LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]// European Conference on Computer Vision. Springer. 2016:21-37.
[6] ZHOU X Y, YAO C, WEN H, et al. East: An efficient and accurate scene text detector[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2642-2651.
[7] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:3431-3440.
[8] NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]// The 18th International Conference on Pattern Recognition (ICPR’06). IEEE, 2006,3:850-855.
[9] LUO W J, LI Y J, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[J]. arXiv preprint arXiv:1701.04128, 2017.
[10] WANG Y X, XIE H T, ZHA Z J, et al. R-Net: A relationship network for efficient and accurate scene text detection[J]. IEEE Transactions on Multimedia, 2020,23:1316-1329.
[11] SHI B G, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2550-2558.
[12] WANG P Q, CHEN P F, YUAN Y, et al. Understanding convolution for semantic segmentation[C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018:1451-1460.
[13] LONG S B, RUAN J Q, ZHANG W J, et al. Textsnake: A flexible representation for detecting text of arbitrary shapes[C]// Proceedings of the European Conference on Computer Vision (ECCV). 2018:20-36.
[14] LI X, WANG W H, HOU W B, et al. Shape robust text detection with progressive scale expansion network[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:9336-9345.
[15] WANG W H, XIE E, SONG X, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:8440-8449.
[16] LIAO M H, WAN Y, YAO C, et al. Real-time scene text detection with differentiable binarization[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2020,34(7):11474-11481.
[17] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
[18] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,37(9):1904-1916.
[19] REDMON J, FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
[20] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
[21] DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017:764-773.
[22] WANG J D, SUN K, CHENG T H, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020,43(10):3349-3364.
[23] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-2890.
[24] 蔡鑫鑫,王敏. 基于分割的任意形状场景文本检测[J]. 计算机系统应用, 2020,29(12):257-262.
[25] VATTI B R. A generic solution to polygon clipping[J]. Communications of the ACM, 1992,35(7):56-63.
[26] RUBY U, YENDAPALLI V. Binary cross entropy with deep learning technique for image classification[J]. International Journal of Advanced Trends in Computer Science and Engineering, 2020. DOI: 10.30534/ijatcse/2020/175942020.
[27] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:761-769.
[28] KINGMA D P, BA J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.
[29] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2117-2125.