计算机与现代化 ›› 2024, Vol. 0 ›› Issue (01): 41-46.doi: 10.3969/j.issn.1006-2475.2024.01.007

• 人工智能 • 上一篇    下一篇

基于生成对抗网络的维语场景文字修改网络

  

  1. (1.新疆农业大学计算机与信息工程学院,新疆 乌鲁木齐  830052; 2.新疆农业信息化工程技术研究中心,新疆 乌鲁木齐830052; 3.中国科学院新疆理化技术研究所多语种信息技术研究室,新疆 乌鲁木齐  830011)((
  • 出版日期:2024-01-23 发布日期:2024-02-23
  • 作者简介:付鸿林(1997—),男,四川大竹人,硕士研究生,研究方向:图像生成,图像超分辨率,E-mail: 1223484459@qq.com;通信作者:张太红(1965—),男,陕西西安人,教授,博士,研究方向:人工智能,E-mail: zth@xjau.edu.cn; 杨雅婷(1985—),女,新疆昌吉人,研究员,博士,研究方向:多语言信息处理,E-mail: yangyt@ms.xjb.ac.cn; 艾孜麦提·艾瓦尼尔(1988—),男(维族),新疆莎车人,助理研究员,博士,研究方向:OCR,E-mail: azmat@ms.xjb.ac.cn; 马博(1985—),男,辽宁沈阳人,研究员,博士,研究方向:自然语言处理,E-mail: mabo@ms.xjb.ac.cn。
  • 基金资助:
    国家自然科学基金资助项目(U2003303); 新疆天山创新团队项目(2020D14045); 中国科学院青年创新促进会项目(科发人函字[2019]26号); 新疆维吾尔自治区自然科学基金重点基金资助项目(2022D01D04); 新疆维吾尔自治区重大科技专项(2022A02011)

Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network

  1. (1. School of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052, China; 
    2. Xinjiang Agricultural Informatization Engineering Technology Research Center, Urumqi 830052, China;
    3. Multilingual Information Technology Laboratory, Xinjiang Technology Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China)
  • Online:2024-01-23 Published:2024-02-23

摘要: 摘要:通过对维语的场景文字检测与识别研究发现,人工采集标注自然场景文字图像是耗时耗力的,因此人工合成的数据是作为训练数据的主要来源。为获得更加真实的数据,本文提出一种基于生成对抗网络的维语场景文字修改网络,利用高效的Transformer模块构建网络,充分提取图像全局与局部特征来完成维语场景文字图像修改,并添加微调模块,对最终结果进行微调。采用WGAN思想策略训练模型,可有效应对模型崩溃以及梯度爆炸等问题。通过在英文-英文,英文-维文的文字修改实验来验证模型的泛化能力和鲁棒性,无论在客观评价指标(SSIM、PSNR)还是视觉上均取得不错效果,并在真实场景数据集SVT以及ICDAR 2013上进行了验证。

关键词: 关键词:生成对抗网络, 场景文字修改, 维语场景文字图像, 高效Transformer, WGAN

Abstract: Abstract: Through the study of scene text detection and recognition in Uyghur languages, it is found that manual acquisition of labeled natural scene text images is time-consuming and labor-intensive. Therefore, artificially synthesized data is used as the main source of training data. To obtain more realistic data,  a scenes text modification network for Uyghur based on generative adversarial network is proposed. The efficient Transformer module is used to construct the network for fully extracting the global and local features of the image to complete the modification of the Uyghur, and a fine-tuning module is added to fine-tune the final results. The model is trained with WGAN thought strategy, which can effectively cope with the problems of pattern collapse as well as gradient explosion. The generalization ability and robustness of the model are verified by text modification experiments in English-English and English-Virginia. Good results are achieved in both objective metrics (SSIM, PSNR) and visual effects, and are validated on real scene datasets SVT and ICDAR 2013.

Key words: Key words: , generative adversarial networks, scene text editing, Uyghur scene text image, efficient Transformer, WGAN

中图分类号: