双向循环神经网络在语音识别中的应用

doi:10.3969/j.issn.1006-2475.2019.10.001

计算机与现代化 ›› 2019, Vol. 0 ›› Issue (10): 1-.doi: 10.3969/j.issn.1006-2475.2019.10.001

• 人工智能 • 下一篇

双向循环神经网络在语音识别中的应用

（1.青海师范大学计算机学院,青海西宁810008；2.藏文信息处理教育部重点实验室，青海西宁810008）

收稿日期:2019-01-28 出版日期:2019-10-28 发布日期:2019-10-29
作者简介:更藏措毛（1993-），女（藏），青海共和人，硕士研究生，研究方向：模式识别，智能系统，E-mail: 1048456641@qq.com；通信作者：黄鹤鸣（1969-），男（藏），青海乐都人，教授，博士，研究方向：模式识别，智能系统，E-mail： 1021489068@qq.com。
基金资助:
青海省自然科学基金资助项目(2016-ZJ-904); 国家自然科学基金资助项目（61662062， 61462072）

Application of Bidirectional Recurrent Neural Network in Speech Recognition

(1. School of Computer Science, Qinghai Normal University, Xining 810008, China；
2. Key Laboratory of Tibetan Information Processing, Ministry of Education, Xining 810008, China)

Received:2019-01-28 Online:2019-10-28 Published:2019-10-29

摘要/Abstract

摘要： 针对前馈神经网络难以处理时序数据的问题，提出将双向循环神经网络（BiRNN）应用在自动语音识别声学建模中。首先，应用梅尔频率倒谱系数进行特征提取；其次，采用双向循环神经网络作为声学模型；最后，测试不同参数对系统性能的影响。在TIMIT数据集上的实验结果表明，与基于卷积神经网络和深度神经网络的声学模型相比，识别率分别提升了1.3%和4.0%，说明基于双向循环神经网络的声学模型具有更好的性能。

关键词: 双向循环神经网络, 语音识别, 梅尔频率倒谱系数, 深度神经网络

Abstract: In order to solve the problem that feed-forward neural network is difficult to process time series data, bidirectional recurrent neural network (BiRNN) is applied in acoustic modeling of automatic speech recognition. Firstly, the Mel frequency cepstrum coefficients are used for feature extraction. Secondly, bidirectional recurrent neural network is used as acoustic model. And finally, the effects of different parameters on system performance are tested. Experimental results on TIMIT dataset show that, compared with convolutional neural network and deep neural network, the recognition rate of the proposed system is improved by 1.3% and 4.0% respectively, which indicates that BiRNN is more suitable for automatic speech recognition.

Key words: bidirectional recurrent neural network, speech recognition, Mel frequency cepstrum coefficient, deep neural network

中图分类号:

TP39

更藏措毛1,2,黄鹤鸣1,2. 双向循环神经网络在语音识别中的应用[J]. 计算机与现代化, 2019, 0(10): 1-.

Gengzang-Cuomao1,2, HUANG He-ming1,2. Application of Bidirectional Recurrent Neural Network in Speech Recognition[J]. Computer and Modernization, 2019, 0(10): 1-.

参考文献

［1］ GEHRING J, MIAO Y, METZE F, et al. Extracting deep bottleneck features using stacked auto-encoders［C］// IEEE International Conference on Acoustics,Speech, and Signal Processing. 2013:3377-3381.
［2］ CAO M, WANG J Z, CAO J W, et al. Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM［C］// 2017 36th Chinese Control Conference. 2017.
［3］蔡尚,金鑫,高圣翔,等. 用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数［J］. 声学学报， 2012,37(6):667-672.〖HJ1.5mm〗
［4］ HASAN R, HUSSEIN H, LAZARIDIS P, et al. Improvement of speech recognition results by a combination of systems［C］// IEEE 23rd International Conference on Automation and Computing (ICAC). 2017:1-4.
［5］王一,杨俊安,刘辉,等. 基于层次稀疏DBN的瓶颈特征提取方法［J］. 模式识别与人工智能， 2015,28(2):173-180.
［6］ SWIETOJANSKI P, GHOSHAL A, RENALS S. Revisiting hybrid and GMM-HMM system combination techniques［C］// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2013:6744-6748.
［7］ ZHOU P, JIANG H, DAI L R, et al. State-clustering based multiple deep neural networks modeling approach for speech recognition［J］. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015,23(4):631-642.
［8］ YU D, SEIDE F, LI G. Conversational speech transcription using context-dependent deep neural networks［C］// Proceedings of the 29th International Conference on Machine Learning. 2012:1-2.
［9］MARTENS J. Deep learning via Hessian-free optimization［C］// Proceedings of the 27th International Conference on Machine Learning. 2010:735-742.
［10］KINGSBURY B, SAINATH T N, SOLTAU H. Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization［C］// Proceedings of the 13th Annual Conference of the International Speech Communication Association. 2012:10-13.
［11］DAHL G E, ACERO A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition［J］. IEEE Transactions on Audio Speech & Language Processing, 2011,20(1):30-42.
［12］KINGSBURY B. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling［C］// Proceedings of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. 2009:3761-3764.
［13］MIAO Y J, METZE F. Improving language-universal feature extraction with deep maxout and convolutional neural networks［C］// Proceedings of the 15th Annual Conference of the International Speech Communication Association. 2014:800-804.
［14］SAINATH T N, VINYALS O, SENIOR A. Convolutional, long short-term memory, fully connected deep neural networks［C］// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2015:4580-4584.
［15］张晴晴,刘勇,王智超,等. 卷积神经网络在语音识别中的应用［J］. 网络新媒体技术， 2014,3(6):39-42.
［16］RAO K, SENIOR A, SAK H. Flat start training of CD-CTCSMBR LSTM RNN acoustic models［C］// 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 2016:5405-5409.
［17］YI J Y, WEN Z Q, TAO J H, et al. CTC regularized model adaptation for Improving LSTM RNN based multi-accent mandarin speech recognition［J］. Journal of Signal Processing Systems, 2017,90(7):985-997.
［18］RAVANELLI M, BRAKEL P, OMOLOGO M, et al. Light gated recurrent units for speech recognition［J］. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018,2(2):92-102.
［19］ARISOY E, SETHY A, RAMABHADRAN B, et al. Bidirectional recurrent neural network language models for automatic speech recognition［C］// 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. 2015:5421-5425.
［20］ZEYER A, DOETSCH P, VOIGTLAENDER P, et al. Comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition［C］// 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2017.
［21］LUO X, ZHOU W W, WANG W P, et al. Attention-based relation extraction with bidirectional gated recurrent unit and highway network in the analysis of geological data［J］. IEEE Access, 2018,6(99):5705-5715.
［22］MIAO Y J, GOWAYYED M, METZE F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding［C］// 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. 2015:167-174.
［23］WATANABE S, HORI T, KIM S, et al. Hybrid CTC/attention architecture for end-to-end speech recognition［J］. IEEE Journal of Selected Topics in Signal Processing, 2017,11(8):1240-1253.
［24］CHOROWSKI J, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition［J］. Computer Science, 2015,10(4):429-439.
［25］DEL-AGUA M A, SANCHIS A, ALBERTO S, et al. Speaker-adapted confidence measures for ASR using deep bidirectional recurrent neural networks［J］. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018,26(7):1198-1206.
［26］ZERARI N, ABDELHAMID S, BOUZGOU H, et.al. Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition［C］// 2018 2nd International Conference on Natural Language and Speech Processing. 2018:1-6.
［27］THANH T M, VAN T P, THANH H N. Improving phonetic recognition with sequence-length standardized MFCC features and deep bi-directional LSTM［C］//The 5th NAFOSTED Conference on Information and Computer Science. 2018:322-325.
［28］GOODFELLOW I, BENGIO Y, COURVILLE A. 深度学习［M］. 北京:人民邮电出版社， 2017:239-240.

[1]	何若男1, 范翔2, 陈益1, 姜羽菲1, 曹辉1. 比例优势逻辑回归优化嗓音障碍指数算法[J]. 计算机与现代化, 2024, 0(08): 1-4.
[2]	陈子健, 段春红. 面向在线学习情境的认知情绪面部表情识别[J]. 计算机与现代化, 2023, 0(10): 92-98.
[3]	崔少国, 张岗, 王奥迪. 基于感知注意力的深度交叉网络推荐模型[J]. 计算机与现代化, 2023, 0(07): 54-60.
[4]	许鸿奎, 张子枫, 卢江坤, 周俊杰, 胡文烨, 姜彤彤. 混合CTC/Attention模型在普通话识别中的应用[J]. 计算机与现代化, 2022, 0(08): 1-6.
[5]	卢悦, 曹春萍. 融合用户历史传播信息的微博谣言检测[J]. 计算机与现代化, 2022, 0(06): 37-42.
[6]	刘玉航１, 曲媛１, 徐英豪１, 朱习军１, 于岩. 基于改进深度神经网络的心血管疾病预测[J]. 计算机与现代化, 2022, 0(06): 75-79.
[7]	甄超, 田宇, 季坤, 张征凯, 黄道友. 基于FFT与DNN的齿轮箱油温数据预测[J]. 计算机与现代化, 2022, 0(04): 17-20.
[8]	张允耀, 黄鹤鸣, 张会云, . 复杂噪声环境下语音识别研究[J]. 计算机与现代化, 2021, 0(09): 68-74.
[9]	王钰蓥, 王勇. 基于知识图谱和Bi-LSTM的推荐算法[J]. 计算机与现代化, 2021, 0(09): 90-98.
[10]	王建华, 冉煜琨. 适用于便携式设备的深度神经网络眼动跟踪[J]. 计算机与现代化, 2021, 0(08): 58-63.
[11]	彭路1,朱君2,邹云峰2. 基于深度神经网络的电力客户诉求预判[J]. 计算机与现代化, 2020, 0(05): 22-.
[12]	张学祥，雷菊阳. 基于DNN与基音周期的说话人识别[J]. 计算机与现代化, 2020, 0(01): 122-.
[13]	齐玉东1，丁海强1，赵锦超2，孙明玮1. 基于biRNN的海军军械不均衡文本数据集处理方法[J]. 计算机与现代化, 2019, 0(12): 21-.
[14]	杨永娇,唐亮亮. 一种基于深度Encoder-Decoder神经网络的智能#br# 电网数据服务器流量异常检测算法[J]. 计算机与现代化, 2019, 0(10): 66-.
[15]	缑新科1，2，3,徐高鹏1，2，3. 基于Gabor滤波的语音识别鲁棒性研究[J]. 计算机与现代化, 2018, 0(05): 20-.

双向循环神经网络在语音识别中的应用

Application of Bidirectional Recurrent Neural Network in Speech Recognition

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价