Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network

doi:10.3969/j.issn.1006-2475.2024.01.007

Abstract

Abstract: Abstract: Through the study of scene text detection and recognition in Uyghur languages， it is found that manual acquisition of labeled natural scene text images is time-consuming and labor-intensive. Therefore， artificially synthesized data is used as the main source of training data. To obtain more realistic data， a scenes text modification network for Uyghur based on generative adversarial network is proposed. The efficient Transformer module is used to construct the network for fully extracting the global and local features of the image to complete the modification of the Uyghur， and a fine-tuning module is added to fine-tune the final results. The model is trained with WGAN thought strategy， which can effectively cope with the problems of pattern collapse as well as gradient explosion. The generalization ability and robustness of the model are verified by text modification experiments in English-English and English-Virginia. Good results are achieved in both objective metrics （SSIM， PSNR） and visual effects， and are validated on real scene datasets SVT and ICDAR 2013.

Key words: Key words: , generative adversarial networks, scene text editing, Uyghur scene text image, efficient Transformer, WGAN

CLC Number:

TP391.41

FU Hong-lin, ZHANG Tai-hong, YANG Ya-ting, Aizimaiti Aiwanier, MA Bo. Scenes Text Modification Network for Uyghur Based on Generative Adversarial Network[J]. Computer and Modernization, 2024, 0(01): 41-46.

References

［1］ WU L， ZHANG C Q， LIU J M， et al. Editing text in the wild［C］// Proceedings of the 27th ACM International Conference on Multimedia. 2019:1500-1508.
［2］ ZAMIR S W， ARORA A， KHAN S， et al. Restormer: Efficient transformer for high-resolution image restoration［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:5728-5739.
［3］ GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020，63（11）:139-144.
［4］ ZHANG B W， GU S Y， ZHANG B， et al. Styleswin: Transformer-based GAN for high-resolution image generation［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and pattern Recognition. 2022:11304-11314.
［5］ SIAROHIN A， LATHUILIERE S， SANGINETO E， et al. Appearance and pose-conditioned human image generation using deformable GANs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021，43（4）:1156-1171.
［6］ GATYS L A， ECKER A S， BETHGE M. A neural algorithm of artistic style［J］. arXiv preprint arXiv: 1508.06576， 2015.
［7］ CHAN C R L N， GINOSAR S， ZHOU T H， et al. Everybody dance now［C］// 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 2019:5932-5941.
［8］ XIE C H， TAN M X， GONG B Q， et al. Adversarial examples improve image recognition［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:816-825.
［9］ ZHU J Y， PARK T， ISOLA P， et al. Unpaired image-to-image translation using cycle-consistent adversarial networks［C］// 2017 IEEE International Conference on Computer Vision （ICCV）. 2017:2242-2251.
［10］ LIU G L， REDA F A， SHIH K J， et al. Image inpainting for irregular holes using partial convolutions［C］// Proceedings of the European Conference on Computer Vision （ECCV）. 2018: 89-105.
［11］ LI W B， ZHOU K， QI L， et al. Best-buddy GANs for highly detailed image super-resolution［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2022，36（2）:1412-1420.
［12］ MIRZA M， OSINDERO S. Conditional generative adversarial nets［J］. arXiv preprint arXiv:1411.1784， 2014.
［13］ RADFORD A， METZ L， CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks［J］. arXiv preprint arXiv:1511.06434， 2015.
［14］ ZHANG H， XU T， LI H S， et al. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks［C］// Proceedings of the IEEE International Conference on Computer Vision. 2017:5908-5916.
［15］ KARRAS T， LAINE S， AILA T. A style-based generator architecture for generative adversarial networks［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2019:4396-4405.
［16］ROY P， BHATTACHARYA S， GHOSH S， et al. STEFANN: Scene text editor using font adaptive neural network［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:13225-13234.
［17］ YANG Q P， HUANG J， LIN W. SwapText: Image based texts transfer in scenes［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2020:14688-14697.
［18］ LEE J， KIM Y， KIM S， et al. RewriteNet: Realistic scene text image generation via editing text in real-world image［J］. arXiv preprint arXiv:2107.11041， 2021.
［19］ KRISHNAN P， KOVVURI R， PANG G， et al. TextStyleBrush: Transfer of text aesthetics from a single example［J］. arXiv preprint arXiv:2106.08385， 2021.
［20］ ZHANG G W， WANG W L， ZHAO P H， et al. Construction of scene tibetan dataset based on GAN［J］. Journal of Physics Conference Series， 2021，1871（1）. DOI: 10.1088/1742-6596/1871/1/012130.
［21］ NERINOVSKY A， BUZHINSKY I， FILCHENKOV A. Realistic text replacement with non-uniform style conditioning［J］. IEEE Access， 2021，9:92706-92714.
［22］ JOHNSON J， ALAHI A， FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution［C］// Proceedings of the 14th European Conference on Computer Vision. 2016: 694-711.
［23］ GATYS L A， ECKER A S， BETHGE M. Image style transfer using convolutional neural networks［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016:2414-2423.
［24］ OLGA R， JIA D， HAO S， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015，115（3）:211-252.
［25］ SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556， 2014.
［26］ ISOLA P， ZHU J Y， ZHOU T H， et al. Image-to-image translation with conditional adversarial networks［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017:5967-5976.
［27］ SHI W Z， CABALLERO J， HUSZAR F， et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2016:1874-1883.

[1]	WANG Xin, XIAO Tao-rui. GAN-based Adversarial Attacks on Face Recognition [J]. Computer and Modernization, 2023, 0(10): 115-120.
[2]	LI Hai-tao, HU Ze-tao, ZHANG Jun-hu. Method of Fish Image Expansion Based on NS-StyleGAN2 Network [J]. Computer and Modernization, 2023, 0(01): 13-17.
[3]	ZHAI Hui-cong, ZHANG Ming, DENG Xing, WANG Li-qun. Image Animation Based on Generative Adversarial Networks [J]. Computer and Modernization, 2022, 0(07): 21-26.
[4]	LI Yang-yang, YANG Ying-guang. Social Bots Detection Based on Generative Adversarial Networks [J]. Computer and Modernization, 2022, 0(03): 1-6.
[5]	JIN Long, WU You, ZHANG Yong-xiang. OFDM Channel Estimation Based on Improved SRGAN [J]. Computer and Modernization, 2021, 0(10): 112-118.
[6]	LIU Yi-hao. A Method to Generate Features of Mimicry Honeypot Based on Generative Adversarial Networks [J]. Computer and Modernization, 2021, 0(07): 120-126.
[7]	CHEN Yuan-yuan, LIU Hui-yi. Damaged Old Photos Inpainting Based on Generative Adversarial Networks [J]. Computer and Modernization, 2021, 0(04): 42-47.
[8]	MA Yue. Smoke Removal Algorithm of Medical Operation Image Based on Conditional Generative Adversarial Network [J]. Computer and Modernization, 2021, 0(01): 50-55.
[9]	ZHOU Li1,2, SHEN Guo-wei1,2, ZHAO Wen-bo1,2, ZHOU Xue-mei1,2. A Heterogeneous Information Network Represention Learning Method Based on GAN [J]. Computer and Modernization, 2020, 0(05): 89-.
[10]	LI Hua-ying1, LIN Dao-yu2, ZHANG Jie1, LIU Bi-xin1 . Cloud Removal Algorithm of Remote Sensing Image Based on GANs [J]. Computer and Modernization, 2019, 0(11): 13-.
[11]	LU Xiangcheng1， ZHANG Li2， BAI Yun1. Fairness Guaranteed Resource Allocation Algorithm on Femtocell Networks [J]. Computer and Modernization, 2014, 0(9): 15-19.