Computer and Modernization ›› 2021, Vol. 0 ›› Issue (09): 68-74.

Previous Articles     Next Articles

Speech Recognition in Complex Noise Environment

  

  1. (1. School of Computer Science, Qinghai Normal University, Xining 810008, China;
    2. Key Laboratory of Tibetan Information Processing, Ministry of Education, Xining 810008, China)
  • Online:2021-09-14 Published:2021-09-14

Abstract: Speech recognition is an important way of human-computer interaction. Aiming at the poor performance of traditional speech recognition systems for noisy speech recognition and inappropriate feature selection, a deep autoencoder recurrent neural network model based on transfer learning is proposed. The model consists of encoder, decoder and acoustic model. Among them, the acoustic model is composed of stack bidirectional recurrent neural network, which is used to improve the recognition performance. The encoder and decoder are composed of full connected layers for feature extraction. The structure and parameters of the encoder are transferred to the acoustic model for joint training, the experimental results on noisy Google commands dataset show that the proposed model can effectively enhance the recognition performance of noisy speech and has good robustness and generalization.

Key words: speech recognition, transfer learning, auto-encoder, joint training