计算机与现代化 ›› 2023, Vol. 0 ›› Issue (11): 95-100.doi: 10.3969/j.issn.1006-2475.2023.11.015

• 图像处理 • 上一篇    下一篇

基于分割的任意形状场景文本实时检测

  

  1. (1.山东建筑大学信息与电气工程学院,山东 济南 250101; 2.山东省智能建筑技术重点实验室,山东 济南 250101)
  • 出版日期:2023-11-29 发布日期:2023-11-29
  • 作者简介:许鸿奎(1966—),男,山东莱芜人,教授,博士,研究方向:模式识别与智能信息处理,E-mail: xhkui2009@163.com;李振业(1999—),男,山东青岛人,硕士研究生,研究方向:计算机视觉,E-mail: 1505139054@qq.com; 郭文涛(1999—),男,山东滨州人,硕士研究生,研究方向:计算机视觉,E-mail: 1411097326@qq.com。
  • 基金资助:
    山东省重大科技创新工程项目(2019JZZY010120); 山东省重点研发计划项目(2019GSF111054)

Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation

  1. (1. School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China;
    2. Shandong Key Laboratory of Intelligent Buildings Technology, Jinan 250101, China)
  • Online:2023-11-29 Published:2023-11-29

摘要: 摘要:当前场景文本检测技术面临的挑战主要体现在2个方面:模型实时性和准确性之间的权衡,以及任意形状文本的检测。它们决定了场景文本检测在真实场景中应用是否可行。针对以上2个问题,本文采用基于分割的方法,提出一种轻量且特征提取能力强的主干网络,可以实时准确地检测任意形状的自然场景文本。具体来说,使用了结构简单的双分辨率残差主干网络和低计算成本的深度聚合金字塔池化模块,将二者提取到的特征融合使用可微二值化模块进行分割。通过在标准英文数据集ICDAR2015上进行的对比实验表明,本文提出的改进方法有效,且在实时性和准确性上都达到可比较的结果。

关键词: 关键词:实时文本检测, 双分辨率主干, 语义分割, 深度聚合金字塔池化模块

Abstract: Abstract:The current challenges of scene text detection technology are mainly reflected in two aspects: the trade-off between model real-time performance and accuracy, and the detection of arbitrary shape text. They determine whether scene text detection is feasible in real scenes. Aiming at the above two problems, this paper proposes a lightweight backbone network with strong feature extraction ability based on segmentation method, which can accurately detect natural scene text of arbitrary shape in real time. Specifically, a simple dual-resolution residual backbone network and a deep aggregate pyramid pooling module with low computational cost are used, and the features extracted from them are fused and segmented using a differentiable binarization module. Through the comparative experiment on the standard English dataset ICDAR2015, the result show that the improved method proposed in this paper is effective, and achieves comparable results in real-time performance and accuracy.

Key words: Key words: real-time text detection, dual resolution backbone, semantic segmentation, deep aggregation pyramid pooling module

中图分类号: