Computer and Modernization ›› 2025, Vol. 0 ›› Issue (03): 22-28.doi: 10.3969/j.issn.1006-2475.2025.03.004

Previous Articles     Next Articles

Speech Enhancement Algorithm Based on Parallel Cascaded Time-frequency Conformer Generative Adversarial Network

  

  1. (School of Information and Communication Engineering, North China University, Taiyuan 030051, China)
  • Online:2025-03-28 Published:2025-03-28

Abstract:  Generative adversarial networks continuously improve network mapping capabilities through the adversarial training mechanism, giving them powerful noise reduction capabilities and are widely used in the field of speech enhancement. In order to solve the problem that the existing generative adversarial network speech enhancement methods do not fully utilize the time-frequency correlation and global correlation in the speech feature sequence and have poor denoising performance, this paper proposes a parallel cascaded time-frequency Conformer generative adversarial network for single channel speech enhancement. Firstly, the parallel cascaded time-frequency Conformer models the sequential features of time and frequency in the speech spectrogram, extracting local and global solicitations in the time domain and frequency domain for generator learning. Then, the two Decoder paths are used to learn the speech spectrogram with the amplitude mask of the noisy speech and the spectrogram of the clean speech respectively to fuse the output of the two paths to obtain the generated speech. Finally, an indicator discriminator is used to evaluate the relevant evaluation index scores of the speech generated by the generator, and the generator generation is improved through adversarial training. The quality of the voice is verified on the public dataset VoiceBank+Demand.

Key words:  , speech enhancement, generative adversarial network, time-frequency Conformer, indicator discriminator, adversarial training

CLC Number: