Computer and Modernization ›› 2022, Vol. 0 ›› Issue (08): 1-6.

    Next Articles

Application of Hybrid CTC/Attention Model in Mandarin Recognition

  

  1. (1. School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China;
    2. Shandong Key Laboratory of Intelligent Buildings Technology, Jinan 250101, China)
  • Online:2022-08-22 Published:2022-08-22

Abstract: The end-to-end speech recognition model based on Connectionist Temporal Classification (CTC) has the advantages of simple structure and automatic alignment, but the recognition accuracy needs to be further improved. This paper introduces the attention mechanism to form a hybrid CTC/Attention end-to-end model. This method adopts the multi-task learning approach, combining the alignment advantage of CTC with the context modeling advantage of attention mechanism. The experimental results show that when the 80-dimensional FBank feature and the 3-dimensional pitch feature are selected as the acoustic features, and the VGG-Bidirectional long short-time memory network is selected as the encoder for Chinese Mandarin recognition, the character error rate of this hybrid model is reduced by about 6.1% compared with the end-to-end model based on CTC, after the external language model is connected, the character error rate is further reduced by 0.3%. Compared with the traditional baseline model, the character error rate also decreased significantly.

Key words: speech recognition, connectionist temporal classification, attention mechanism, end-to-end