基于生成对抗网络的维语场景文字修改网络

doi:10.3969/j.issn.1006-2475.2024.01.007

计算机与现代化 ›› 2024, Vol. 0 ›› Issue (01): 41-46.doi: 10.3969/j.issn.1006-2475.2024.01.007

基于生成对抗网络的维语场景文字修改网络

（1.新疆农业大学计算机与信息工程学院，新疆乌鲁木齐 830052； 2.新疆农业信息化工程技术研究中心，新疆乌鲁木齐830052； 3.中国科学院新疆理化技术研究所多语种信息技术研究室，新疆乌鲁木齐 830011）（（

出版日期:2024-01-23 发布日期:2024-02-23
作者简介:付鸿林（1997—），男，四川大竹人，硕士研究生，研究方向:图像生成，图像超分辨率，E-mail: 1223484459@qq.com；通信作者：张太红（1965—），男，陕西西安人，教授，博士，研究方向:人工智能，E-mail: zth@xjau.edu.cn；杨雅婷（1985—），女，新疆昌吉人，研究员，博士，研究方向:多语言信息处理，E-mail: yangyt@ms.xjb.ac.cn；艾孜麦提·艾瓦尼尔（1988—），男（维族），新疆莎车人，助理研究员，博士，研究方向:OCR，E-mail: azmat@ms.xjb.ac.cn；马博（1985—），男，辽宁沈阳人，研究员，博士，研究方向:自然语言处理，E-mail: mabo@ms.xjb.ac.cn。
基金资助:
国家自然科学基金资助项目（U2003303）；新疆天山创新团队项目（2020D14045）；中国科学院青年创新促进会项目（科发人函字［2019］26号）；新疆维吾尔自治区自然科学基金重点基金资助项目（2022D01D04）；新疆维吾尔自治区重大科技专项（2022A02011）

Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network

（1. School of Computer and Information Engineering， Xinjiang Agricultural University， Urumqi 830052， China；
2. Xinjiang Agricultural Informatization Engineering Technology Research Center， Urumqi 830052， China；
3. Multilingual Information Technology Laboratory， Xinjiang Technology Institute of Physics and Chemistry， Chinese Academy of Sciences， Urumqi 830011， China）

Online:2024-01-23 Published:2024-02-23

摘要/Abstract

摘要： 摘要：通过对维语的场景文字检测与识别研究发现，人工采集标注自然场景文字图像是耗时耗力的，因此人工合成的数据是作为训练数据的主要来源。为获得更加真实的数据，本文提出一种基于生成对抗网络的维语场景文字修改网络，利用高效的Transformer模块构建网络，充分提取图像全局与局部特征来完成维语场景文字图像修改，并添加微调模块，对最终结果进行微调。采用WGAN思想策略训练模型，可有效应对模型崩溃以及梯度爆炸等问题。通过在英文-英文，英文-维文的文字修改实验来验证模型的泛化能力和鲁棒性，无论在客观评价指标（SSIM、PSNR）还是视觉上均取得不错效果，并在真实场景数据集SVT以及ICDAR 2013上进行了验证。

关键词: 关键词：生成对抗网络, 场景文字修改, 维语场景文字图像, 高效Transformer, WGAN

Abstract: Abstract: Through the study of scene text detection and recognition in Uyghur languages， it is found that manual acquisition of labeled natural scene text images is time-consuming and labor-intensive. Therefore， artificially synthesized data is used as the main source of training data. To obtain more realistic data， a scenes text modification network for Uyghur based on generative adversarial network is proposed. The efficient Transformer module is used to construct the network for fully extracting the global and local features of the image to complete the modification of the Uyghur， and a fine-tuning module is added to fine-tune the final results. The model is trained with WGAN thought strategy， which can effectively cope with the problems of pattern collapse as well as gradient explosion. The generalization ability and robustness of the model are verified by text modification experiments in English-English and English-Virginia. Good results are achieved in both objective metrics （SSIM， PSNR） and visual effects， and are validated on real scene datasets SVT and ICDAR 2013.

Key words: Key words: , generative adversarial networks, scene text editing, Uyghur scene text image, efficient Transformer, WGAN

中图分类号:

TP391.41

付鸿林, 张太红, 杨雅婷, 艾孜麦提·艾瓦尼尔, 马博. 基于生成对抗网络的维语场景文字修改网络[J]. 计算机与现代化, 2024, 0(01): 41-46.

FU Hong-lin, ZHANG Tai-hong, YANG Ya-ting, Aizimaiti Aiwanier, MA Bo. Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network[J]. Computer and Modernization, 2024, 0(01): 41-46.

参考文献

［1］ WU L， ZHANG C Q， LIU J M， et al. Editing text in the wild［C］// Proceedings of the 27th ACM International Conference on Multimedia. 2019:1500-1508.
［2］ ZAMIR S W， ARORA A， KHAN S， et al. Restormer: Efficient transformer for high-resolution image restoration［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:5728-5739.
［3］ GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020，63（11）:139-144.
［4］ ZHANG B W， GU S Y， ZHANG B， et al. Styleswin: Transformer-based GAN for high-resolution image generation［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and pattern Recognition. 2022:11304-11314.
［5］ SIAROHIN A， LATHUILIERE S， SANGINETO E， et al. Appearance and pose-conditioned human image generation using deformable GANs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021，43（4）:1156-1171.
［6］ GATYS L A， ECKER A S， BETHGE M. A neural algorithm of artistic style［J］. arXiv preprint arXiv: 1508.06576， 2015.
［7］ CHAN C R L N， GINOSAR S， ZHOU T H， et al. Everybody dance now［C］// 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 2019:5932-5941.
［8］ XIE C H， TAN M X， GONG B Q， et al. Adversarial examples improve image recognition［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:816-825.
［9］ ZHU J Y， PARK T， ISOLA P， et al. Unpaired image-to-image translation using cycle-consistent adversarial networks［C］// 2017 IEEE International Conference on Computer Vision （ICCV）. 2017:2242-2251.
［10］ LIU G L， REDA F A， SHIH K J， et al. Image inpainting for irregular holes using partial convolutions［C］// Proceedings of the European Conference on Computer Vision （ECCV）. 2018: 89-105.
［11］ LI W B， ZHOU K， QI L， et al. Best-buddy GANs for highly detailed image super-resolution［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2022，36（2）:1412-1420.
［12］ MIRZA M， OSINDERO S. Conditional generative adversarial nets［J］. arXiv preprint arXiv:1411.1784， 2014.
［13］ RADFORD A， METZ L， CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks［J］. arXiv preprint arXiv:1511.06434， 2015.
［14］ ZHANG H， XU T， LI H S， et al. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks［C］// Proceedings of the IEEE International Conference on Computer Vision. 2017:5908-5916.
［15］ KARRAS T， LAINE S， AILA T. A style-based generator architecture for generative adversarial networks［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:4396-4405.
［16］ROY P， BHATTACHARYA S， GHOSH S， et al. STEFANN: Scene text editor using font adaptive neural network［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:13225-13234.
［17］ YANG Q P， HUANG J， LIN W. SwapText: Image based texts transfer in scenes［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:14688-14697.
［18］ LEE J， KIM Y， KIM S， et al. RewriteNet: Realistic scene text image generation via editing text in real-world image［J］. arXiv preprint arXiv:2107.11041， 2021.
［19］ KRISHNAN P， KOVVURI R， PANG G， et al. TextStyleBrush: Transfer of text aesthetics from a single example［J］. arXiv preprint arXiv:2106.08385， 2021.
［20］ ZHANG G W， WANG W L， ZHAO P H， et al. Construction of scene tibetan dataset based on GAN［J］. Journal of Physics Conference Series， 2021，1871（1）. DOI: 10.1088/1742-6596/1871/1/012130.
［21］ NERINOVSKY A， BUZHINSKY I， FILCHENKOV A. Realistic text replacement with non-uniform style conditioning［J］. IEEE Access， 2021，9:92706-92714.
［22］ JOHNSON J， ALAHI A， FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution［C］// Proceedings of the 14th European Conference on Computer Vision. 2016: 694-711.
［23］ GATYS L A， ECKER A S， BETHGE M. Image style transfer using convolutional neural networks［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016:2414-2423.
［24］ OLGA R， JIA D， HAO S， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015，115（3）:211-252.
［25］ SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556， 2014.
［26］ ISOLA P， ZHU J Y， ZHOU T H， et al. Image-to-image translation with conditional adversarial networks［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:5967-5976.
［27］ SHI W Z， CABALLERO J， HUSZAR F， et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016:1874-1883.

[1]	王志强, 郑爽. 基于半监督学习的StyleGAN图像生成模型[J]. 计算机与现代化, 2024, 0(06): 14-18.
[2]	卢梓菡1, 张东1, 杨艳1, 杨双2. 基于生成对抗网络的乳腺癌免疫组化图像生成[J]. 计算机与现代化, 2024, 0(03): 92-96.
[3]	金龙, 吴游, 张泳翔. 基于改进SRGAN的OFDM信道估计方法[J]. 计算机与现代化, 2021, 0(10): 112-118.

基于生成对抗网络的维语场景文字修改网络

Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价