计算机与现代化

• 人工智能 •    下一篇

双向循环神经网络在语音识别中的应用

  

  1. (1.青海师范大学计算机学院,青海西宁810008;2.藏文信息处理教育部重点实验室,青海西宁810008)
  • 收稿日期:2019-01-28 出版日期:2019-10-28 发布日期:2019-10-29
  • 作者简介:更藏措毛(1993-),女(藏),青海共和人,硕士研究生,研究方向:模式识别,智能系统,E-mail: 1048456641@qq.com; 通信作者:黄鹤鸣(1969-),男(藏),青海乐都人,教授,博士,研究方向:模式识别,智能系统,E-mail: 1021489068@qq.com。
  • 基金资助:
    青海省自然科学基金资助项目(2016-ZJ-904); 国家自然科学基金资助项目(61662062, 61462072)

Application of Bidirectional Recurrent Neural Network in Speech Recognition

  1. (1. School of Computer Science, Qinghai Normal University, Xining 810008, China;
    2. Key Laboratory of Tibetan Information Processing, Ministry of Education, Xining 810008, China)
  • Received:2019-01-28 Online:2019-10-28 Published:2019-10-29

摘要: 针对前馈神经网络难以处理时序数据的问题,提出将双向循环神经网络(BiRNN)应用在自动语音识别声学建模中。首先,应用梅尔频率倒谱系数进行特征提取;其次,采用双向循环神经网络作为声学模型;最后,测试不同参数对系统性能的影响。在TIMIT数据集上的实验结果表明,与基于卷积神经网络和深度神经网络的声学模型相比,识别率分别提升了1.3%和4.0%,说明基于双向循环神经网络的声学模型具有更好的性能。

关键词: 双向循环神经网络, 语音识别, 梅尔频率倒谱系数, 深度神经网络

Abstract: In order to solve the problem that feed-forward neural network is difficult to process time series data, bidirectional recurrent neural network (BiRNN) is applied in acoustic modeling of automatic speech recognition. Firstly, the Mel frequency cepstrum coefficients are used for feature extraction. Secondly, bidirectional recurrent neural network is used as acoustic model. And finally, the effects of different parameters on system performance are tested. Experimental results on TIMIT dataset show that, compared with convolutional neural network and deep neural network, the recognition rate of the proposed system is improved by 1.3% and 4.0% respectively, which indicates that BiRNN is more suitable for automatic speech recognition.

Key words: bidirectional recurrent neural network, speech recognition, Mel frequency cepstrum coefficient, deep neural network

中图分类号: