计算机与现代化 ›› 2021, Vol. 0 ›› Issue (09): 68-74.

• 人工智能 • 上一篇    下一篇

复杂噪声环境下语音识别研究

  

  1. (1.青海师范大学计算机学院,青海西宁810008;2.藏文信息处理教育部重点实验室,青海西宁810008)
  • 出版日期:2021-09-14 发布日期:2021-09-14
  • 作者简介:张允耀(1998—),男,山西原平人,硕士研究生,研究方向:模式识别与智能系统,E-mail: 1016751809@qq.com; 黄鹤鸣(1969—),男(藏),青海乐都人,教授,博士,研究方式:模式识别与智能系统,E-mail: 1021489068@qq.com; 张会云(1993—),女,甘肃庆阳人,博士研究生,研究方向:模式识别与智能系统,E-mail: 1406043513@qq.com。
  • 基金资助:
    国家自然科学基金资助项目(62066039)

Speech Recognition in Complex Noise Environment

  1. (1. School of Computer Science, Qinghai Normal University, Xining 810008, China;
    2. Key Laboratory of Tibetan Information Processing, Ministry of Education, Xining 810008, China)
  • Online:2021-09-14 Published:2021-09-14

摘要: 语音识别是人机交互的重要方式,针对传统语音识别系统对含噪语音识别性能较差、特征选择不恰当的问题,提出一种基于迁移学习的深度自编码器循环神经网络模型。该模型由编码器、解码器以及声学模型组成,其中,声学模型由堆栈双向循环神经网络构成,用于提升识别性能;编码器和解码器均由全连接层构成,用于特征提取。将编码器结构及参数迁移至声学模型进行联合训练,在含噪Google Commands数据集上的实验表明本文模型有效增强了含噪语音的识别性能,并且具有较好的鲁棒性和泛化性。

关键词: 语音识别, 迁移学习, 自编码器, 联合训练

Abstract: Speech recognition is an important way of human-computer interaction. Aiming at the poor performance of traditional speech recognition systems for noisy speech recognition and inappropriate feature selection, a deep autoencoder recurrent neural network model based on transfer learning is proposed. The model consists of encoder, decoder and acoustic model. Among them, the acoustic model is composed of stack bidirectional recurrent neural network, which is used to improve the recognition performance. The encoder and decoder are composed of full connected layers for feature extraction. The structure and parameters of the encoder are transferred to the acoustic model for joint training, the experimental results on noisy Google commands dataset show that the proposed model can effectively enhance the recognition performance of noisy speech and has good robustness and generalization.

Key words: speech recognition, transfer learning, auto-encoder, joint training