Computer and Modernization ›› 2024, Vol. 0 ›› Issue (12): 15-23.doi: 10.3969/j.issm.1006-2475.2024.12.003

Previous Articles     Next Articles

Fashion Clothing Pattern Generation Based on Improved Stable Diffusion

  

  1. (School of Computer Science, Xi’an Polytechnic University, Xi’an 710048, China)
  • Online:2024-12-31 Published:2024-12-31

Abstract:  Dress pattern is a window for people to show their personality and fashion. In recent years, with the continuous development of multimodal technology, text-based dress pattern generation has been well studied. However, the existing methods have not been well applied due to the problems of combining poor semanticity and low resolution. After the large-scale language-image pre-training model CLIP was proposed, various pre-training diffusion models combined with CLIP for text-image generation tasks have become the mainstream methods in this field. However, the original pre-training models have poor generalization ability to the downstream task, relying solely on the pre-training model does not allow flexible and accurate control of the color and structure of the dress pattern, and its large number of parameters is difficult to re-train from scratch. To solve the above problems, this study designs a Stable Diffusion-improved network FT-SDM-L (Fine Tuning-Stable Diffusion Model-Lion), which uses the dress image text dataset to update the weights of the cross-attention module in the original model. The experimental results show that the ClipScore and HPS v2 scores of the fine-tuned model are improved by 0.08 and 1.22 on average, which validates the important ability of this module in combining textual information. Subsequently, to further enhance the model’s feature extraction and data mapping capabilities in the apparel domain, a lightweight adapter, Stable-Adapter, was designed to be added to the module’s output location to maximize the sensing of changes in the input cues. By adding only 0.75% extra parameters to the adapter, the ClipScore and HPS v2 scores of the model can be further improved by 0.05, 0.38. Good results are achieved in terms of fidelity and semantic consistency of clothing pattern generation 

Key words:  , text image generation; diffusion model; cross-attention mechanism; image generation; computer vision

CLC Number: