计算机与现代化 ›› 2024, Vol. 0 ›› Issue (12): 84-90.doi: 10.3969/j.issn.1006-2475.2024.12.013

• 图像处理 • 上一篇    下一篇

中英文场景文本图像的检测和识别算法


  

  1. (江南大学物联网工程学院,江苏 无锡 214122)
  • 出版日期:2024-12-31 发布日期:2024-12-31
  • 基金资助:
    国家自然科学基金青年基金资助项目(6170185); 国家自然科学基金资助项目(61901206)

Detection and Recognition Algorithms for Chinese and English Scene Text Images

  1. (School of Interest of Things, Jiangnan University, Wuxi 214122, China)
  • Online:2024-12-31 Published:2024-12-31

摘要: 场景文本图像的背景复杂,检测算法难以定位文本区域,导致识别难度较高。为了同时检测和识别中文和英文的场景文本图像内容,并提高其检测和识别的准确率,提出一种基于ABCNetv2网络改进的算法模型TD-ABCNetv2。针对文本的形状、排列和字体等特征存在差异性的问题,该模型以SKNet作为骨干网络,引入选择性核函数SK模块,帮助网络学习不同尺度的特征,适应不同尺度、形状和方向的文本。考虑到中英文场景文本的字符大小和间隔不同,在FPN结构中增加ECA注意力模块,更有效地整合通道信息,增强网络对不同特征的敏感性,使得特征融合更有针对性。同时引入CIoU损失函数,更准确地衡量边界框之间的重叠程度,适应文本形状的变化,增强模型的泛化能力。通过在多个公开数据集上进行实验,结果表明了本文模型的有效性。

关键词: 场景文本, 中文文本检测, SKNet, 注意力机制, 交并比

Abstract:  The complex background of scene text images makes it challenging for detection algorithms to locate text regions accurately, leading to difficulties in recognition. To simultaneously detect and recognize scene text content in both Chinese and English languages, and improve the accuracy of detection and recognition, an improved algorithmic model TD-ABCNetv2 based on ABCNetv2 network is proposed. Addressing the issue of variations in text features such as shape, arrangement, and font, this model adopts SKNet as the backbone network and introduces the Selective Kernel module to help the network learn features of different scales, accommodating texts of various scales, shapes, and orientations. Considering the different character sizes and intervals of Chinese and English scene texts, the ECA attention module is added to the FPN structure to integrate the channel information more effectively, enhance the network’s sensitivity to different features, and make the feature fusion more targeted. Additionally, the CIoU loss function is introduced to more accurately measure the degree of overlap between bounding boxes, adapt to changes in the shape of the text, and enhance the generalization ability of the model. The experimental results show the proposed model is validated through experiments on several public datasets.

Key words: scene text, Chinese text detection, SKNet, attention mechanism, IoU

中图分类号: