计算机与现代化 ›› 2025, Vol. 0 ›› Issue (12): 97-106.doi: 10.3969/j.issn.1006-2475.2025.12.014

• 图像识别 • 上一篇    下一篇

基于注意力机制的人脸素描-照片图像生成网络

  


  1. (1.江西财经大学教务处,江西 南昌 330032; 2.江西财经大学深度图像视觉实验室,江西 南昌 330032;
    3.江西财经大学软件与物联网工程学院,江西 南昌 330032; 4.江西省科技基础条件平台中心,江西 南昌 330003) 
  • 出版日期:2025-12-18 发布日期:2025-12-18
  • 作者简介:作者简介:姚丽(1992—),女,江西上饶人,研究实习员,硕士,研究方向:图像处理与机器学习,E-mail: yaolilyplus@163.com; 占博思(1999—),男,江西上饶人,硕士研究生,研究方向:图像处理,E-mail: 572421985@qq.com; 通信作者:万伟国(1991—),男,江西上饶人,讲师,博士,研究方向:计算机视觉,E-mail: wanweiguo@jxufe.edu.cn; 罗亦韬(2005—),男,江西南昌人,本科生,研究方向:图像处理,E-mail: 3014815803@qq.com; 杨宇仙(1973—),女,江西兴国人,高级工程师,硕士,研究方向:计算机应用,E-mail: 56638293@qq.com。
  • 基金资助:
      基金项目:国家自然科学基金资助项目(62261025); 江西省自然科学基金青年项目(20232BAB212015)
        

Face Sketch-photo Synthesis Network Based on Attention Mechanism


  1. (1. Office of Educational Administration, Jiangxi University of Finance and Economics, Nanchang 330032, China;
    2. Deep Image Vision Lab, Jiangxi University of Finance and Economics, Nanchang 330032, China;
    3. School of Software and Internet of Things Engineering, Jiangxi University of Finance and Economics, Nanchang 330032, China;
    4. Jiangxi Province Science and Technology Infrastructure Center, Nanchang 330003, China)
  • Online:2025-12-18 Published:2025-12-18

摘要: 摘要:人脸素描-照片生成是图像生成的一个重要分支,在数字娱乐和公安刑侦等领域具有广泛应用。由于人脸素描图像只包含灰度信息,且丢失了人脸的大部分纹理细节,现有方法生成的人脸照片图像存在人脸结构缺失、细节信息不足和色彩失真等问题。针对上述问题,本文提出一种基于注意力机制的人脸素描-照片生成网络。首先,设计一种Vision Transformer 和U-Net相结合的生成器,有效提取人脸全局和局部特征,提升生成的人脸照片图像结构完整性。其次,构造一种改进的选择内核注意力模块,提升模型对细节信息的捕获能力,使生成的图像包含更多的人脸细节信息。最后,设计一种基于通道和像素注意力的判别器,增强生成对抗网络的对抗学习能力,减少生成人脸图像的颜色失真。通过与其他先进方法的主客观实验比较,本文提出的方法在人脸素描-照片图像生成任务上表现出更优的主观视觉效果和客观评价指标。在通用的CUHK、AR和XM2VTS人脸素描数据集上,本文方法的SSIM指标较次优值分别提升了11.6%、6.2%和4.5%,验证了本文方法的有效性。

关键词: 关键词:人脸素描-照片图像生成, 注意力机制; Transformer; 生成对抗网络

Abstract: Abstract: Face sketch-photo synthesis is an important branch of image transformation, with broad applications in digital entertainment, public security, and criminal investigation. Since face sketch images only contain grayscale information and lose most of the texture details of the face, existing methods often suffer from structural deficiencies, insufficient detail information, and color distortion in the synthesized face photos. To address these issues, this paper proposes a face sketch-photo synthesis network based on an attention mechanism. First, this paper designs a generator by combining Vision Transformer and U-Net to effectively extract global and local facial features, improving the structural integrity of the generated face photos. Additionally, an improved selective kernel attention module is constructed to enhance the model’s ability to capture fine details, enabling the generated images to retain more facial texture information. Finally, this paper designs a discriminator based on channel and pixel-wise attention to strengthen the adversarial learning capability of the generative adversarial network (GAN), reducing color distortion in the synthesized face photo images. Through subjective and objective experiments comparing with other state-of-the-art methods, the proposed approach demonstrates superior performance in both visual quality and objective metrics for face sketch-photo synthesis. On the CUHK, AR, and XM2VTS face sketch datasets, the proposed method achieves 11.6%, 6.2%, and 4.5% improvements in SSIM metric over the second-best results, respectively, proving the effectiveness of the proposed method.

Key words: Key words: face sketch-photo synthesis, attention mechanism, Transformer, generative adversarial network

中图分类号: