计算机与现代化 ›› 2024, Vol. 0 ›› Issue (08): 114-119.doi: 10.3969/j.issn.1006-2475.2024.08.018

• 中文信息处理技术 • 上一篇    下一篇



  1. (西安科技大学通信与信息工程学院,陕西 西安 710600)
  • 出版日期:2024-08-28 发布日期:2024-08-29
  • 基金资助:
    国家重点研发计划项目(2018YFC0808300); 陕西省科技计划重点产业创新链(群)项目(2020ZDLGY15-07); 西安市科技计划科技创新引导项目(201805036YD14CG20(4))

Chinese Paper Invoice Text Recognition Method with Character Blurring

  1. (School of Communication and Information Engineering, Xi’an University of Science and Technology, Xi’an 710600, China)
  • Online:2024-08-28 Published:2024-08-29

摘要: 基于纸质发票字符模糊导致OCR识别性能低下的问题,本文提出一种自适应迭代视觉语义模型来解决此问题。该模型包含2个模块:识别模块利用ResNet作为编码器,Transformer为解码器对模糊文本进行初步预测;修正模块将识别模块的预测结果传入双向语义模型,依据上下文语义信息修正字符,进行初步的文本修正,再将结果与标签输入判别器,若判别成功则直接输出结果,若判别失败则会将结果迭代语义模型,进一步修正,提高识别率。实验结果表明,本文所提模型相比目前的中文识别模型ch_PP-OCRv3的识别正确率高出3.39个百分点,与其他模型相比识别率平均提高6.81个百分点,并且在IC15、IIIT5K和IC03-Word等公开数据集中均表现出色,验证了模型的泛化性能。

关键词: 文字识别, 模糊文本, 纸质发票, 神经网络, ResNet

Abstract:  This paper addresses the problem of low OCR recognition performance caused by character blurring in paper invoices. A novel adaptive iterative visual semantic model is proposed to tackle this issue. The model consists of two modules: the recognition module utilizes ResNet as the encoder and Transformer as the decoder to make initial predictions on the blurred text. The correction module takes the recognition module’s predictions and feeds them into a bidirectional language model, which leverages contextual semantic information to refine characters, thereby performing initial text correction. The results are then input to a discriminator, which outputs them directly if successful or iterates the language model for further refinement if failed, effectively improving the recognition accuracy. Experimental results demonstrate that the proposed model outperforms the current state-of-the-art Chinese recognition model ch_PP-OCRv3 by 3.39 percentage points in recognition accuracy and achieves an average 6.81 percentage points improvement compared to other models. Moreover, the model exhibits excellent generalization performance on public datasets such as IC15, IIIT5K, and IC03-Word, validating its effectiveness.

Key words: text recognition, blurry text, paper invoice, neural network, ResNet
