Computer and Modernization ›› 2023, Vol. 0 ›› Issue (05): 52-57.

Previous Articles     Next Articles

Improved End-to-end Synthetic Speech Detection Method Based on Auxiliary Learning

  

  1. (College of Energy and Electrical Engineering, Hohai University, Nanjing 211100, China)
  • Online:2023-06-06 Published:2023-06-06

Abstract: With the development of deep forgery technology, synthetic speech detection faces more and more challenges, a synthetic speech detection method is proposed, which integrates auxiliary learning into end-to-end model. After data alignment, the audio data is directly input to the improved end-to-end model without extracting any manual features. The main task is to classify real speech and synthetic speech. At the same time, different synthetic speech types are selected as auxiliary tasks to provide a priori hypothesis for the combined speech detection of the main task, and the weight superposition of the main and auxiliary tasks is optimized. The experimental results on the open datasets ASVspoof2019 and ASVspoof2015 show that the improved model in this paper can effectively reduce the equal error rate compared with the model using manual features, and is better than the end-to-end model before the improvement, and has better generalization ability in the face of unknown attack types.

Key words: deep forgery, synthetic speech detection, auxiliary learning, weight optimization, end-to-end system