Computer and Modernization ›› 2025, Vol. 0 ›› Issue (07): 69-76.doi: 10.3969/j.issn.1006-2475.2025.07.010

Previous Articles     Next Articles

Speech Cloning Method Based on Self-attention Mechanism Speaker Encoder And SA-Decoder

  


  1. (Computer Science Academy, Xi’an Polytechnic University, Xi’an 710600, China)
  • Online:2025-07-22 Published:2025-07-22

Abstract: Abstract: FreeVC model performs well in the field of speech cloning technology. However, due to the complex variations in speech features and information contained in speech sequences, such as timbre and style, the Speaker Encoder module in the FreeVC model only uses a single LSTM network, which is difficult to accurately extract and represent the speaker information, which leads to a decrease in the performance of the model in processing speech sequences, affecting the quality and accuracy of sound conversion. Moreover, the FreeVC model uses a traditional decoder, where the upsampling (deconvolution) operation can cause loss of detail, resulting in blurry and unclear speech articulation details in the reconstructed audio, thus generating audio artifacts. To address these issues, this paper proposes a speech cloning method based on self-attention mechanism, FreeVC-SA, for speaker encoder and SA-Decoder. The method takes the speaker’s Mel spectrum as input, and adds a self-attention mechanism on the LSTM network to help the model better capture long-distance dependencies and more accurate extract features such as speaker’s tone and style. Using the SA-Decoder decoder can effectively solve the problem of local receptive field limitation, making the generated speech cloning effect more realistic and clearer. Experimental results show that compared with all baseline models, FreeVC-SA speech cloning has significantly improved naturalness similarity and emotional similarity, and word error rate and character error rate have significantly decreased. 

Key words: Key words: speech cloning, speaker encoder, SA-Decoder, self-attention mechanism, FreeVC-SA

CLC Number: