Application of Bidirectional Recurrent Neural Network in Speech Recognition

doi:10.3969/j.issn.1006-2475.2019.10.001

Abstract

Abstract: In order to solve the problem that feed-forward neural network is difficult to process time series data, bidirectional recurrent neural network (BiRNN) is applied in acoustic modeling of automatic speech recognition. Firstly, the Mel frequency cepstrum coefficients are used for feature extraction. Secondly, bidirectional recurrent neural network is used as acoustic model. And finally, the effects of different parameters on system performance are tested. Experimental results on TIMIT dataset show that, compared with convolutional neural network and deep neural network, the recognition rate of the proposed system is improved by 1.3% and 4.0% respectively, which indicates that BiRNN is more suitable for automatic speech recognition.

Key words: bidirectional recurrent neural network, speech recognition, Mel frequency cepstrum coefficient, deep neural network

CLC Number:

TP39

Gengzang-Cuomao1,2, HUANG He-ming1,2. Application of Bidirectional Recurrent Neural Network in Speech Recognition[J]. Computer and Modernization, 2019, 0(10): 1-.

References

［1］ GEHRING J, MIAO Y, METZE F, et al. Extracting deep bottleneck features using stacked auto-encoders［C］// IEEE International Conference on Acoustics,Speech, and Signal Processing. 2013:3377-3381.
［2］ CAO M, WANG J Z, CAO J W, et al. Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM［C］// 2017 36th Chinese Control Conference. 2017.
［3］蔡尚,金鑫,高圣翔,等. 用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数［J］. 声学学报， 2012,37(6):667-672.〖HJ1.5mm〗
［4］ HASAN R, HUSSEIN H, LAZARIDIS P, et al. Improvement of speech recognition results by a combination of systems［C］// IEEE 23rd International Conference on Automation and Computing (ICAC). 2017:1-4.
［5］王一,杨俊安,刘辉,等. 基于层次稀疏DBN的瓶颈特征提取方法［J］. 模式识别与人工智能， 2015,28(2):173-180.
［6］ SWIETOJANSKI P, GHOSHAL A, RENALS S. Revisiting hybrid and GMM-HMM system combination techniques［C］// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2013:6744-6748.
［7］ ZHOU P, JIANG H, DAI L R, et al. State-clustering based multiple deep neural networks modeling approach for speech recognition［J］. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015,23(4):631-642.
［8］ YU D, SEIDE F, LI G. Conversational speech transcription using context-dependent deep neural networks［C］// Proceedings of the 29th International Conference on Machine Learning. 2012:1-2.
［9］MARTENS J. Deep learning via Hessian-free optimization［C］// Proceedings of the 27th International Conference on Machine Learning. 2010:735-742.
［10］KINGSBURY B, SAINATH T N, SOLTAU H. Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization［C］// Proceedings of the 13th Annual Conference of the International Speech Communication Association. 2012:10-13.
［11］DAHL G E, ACERO A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition［J］. IEEE Transactions on Audio Speech & Language Processing, 2011,20(1):30-42.
［12］KINGSBURY B. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling［C］// Proceedings of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. 2009:3761-3764.
［13］MIAO Y J, METZE F. Improving language-universal feature extraction with deep maxout and convolutional neural networks［C］// Proceedings of the 15th Annual Conference of the International Speech Communication Association. 2014:800-804.
［14］SAINATH T N, VINYALS O, SENIOR A. Convolutional, long short-term memory, fully connected deep neural networks［C］// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2015:4580-4584.
［15］张晴晴,刘勇,王智超,等. 卷积神经网络在语音识别中的应用［J］. 网络新媒体技术， 2014,3(6):39-42.
［16］RAO K, SENIOR A, SAK H. Flat start training of CD-CTCSMBR LSTM RNN acoustic models［C］// 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 2016:5405-5409.
［17］YI J Y, WEN Z Q, TAO J H, et al. CTC regularized model adaptation for Improving LSTM RNN based multi-accent mandarin speech recognition［J］. Journal of Signal Processing Systems, 2017,90(7):985-997.
［18］RAVANELLI M, BRAKEL P, OMOLOGO M, et al. Light gated recurrent units for speech recognition［J］. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018,2(2):92-102.
［19］ARISOY E, SETHY A, RAMABHADRAN B, et al. Bidirectional recurrent neural network language models for automatic speech recognition［C］// 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. 2015:5421-5425.
［20］ZEYER A, DOETSCH P, VOIGTLAENDER P, et al. Comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition［C］// 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2017.
［21］LUO X, ZHOU W W, WANG W P, et al. Attention-based relation extraction with bidirectional gated recurrent unit and highway network in the analysis of geological data［J］. IEEE Access, 2018,6(99):5705-5715.
［22］MIAO Y J, GOWAYYED M, METZE F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding［C］// 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. 2015:167-174.
［23］WATANABE S, HORI T, KIM S, et al. Hybrid CTC/attention architecture for end-to-end speech recognition［J］. IEEE Journal of Selected Topics in Signal Processing, 2017,11(8):1240-1253.
［24］CHOROWSKI J, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition［J］. Computer Science, 2015,10(4):429-439.
［25］DEL-AGUA M A, SANCHIS A, ALBERTO S, et al. Speaker-adapted confidence measures for ASR using deep bidirectional recurrent neural networks［J］. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018,26(7):1198-1206.
［26］ZERARI N, ABDELHAMID S, BOUZGOU H, et.al. Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition［C］// 2018 2nd International Conference on Natural Language and Speech Processing. 2018:1-6.
［27］THANH T M, VAN T P, THANH H N. Improving phonetic recognition with sequence-length standardized MFCC features and deep bi-directional LSTM［C］//The 5th NAFOSTED Conference on Information and Computer Science. 2018:322-325.
［28］GOODFELLOW I, BENGIO Y, COURVILLE A. 深度学习［M］. 北京:人民邮电出版社， 2017:239-240.

[1]	HE Ruonan1, FAN Xiang2, CHEN Yi1, JIANG Yufei1, CAO Hui1. Proportional Dominance Logistic Regression Optimized Voice Disorder Index Algorithm [J]. Computer and Modernization, 2024, 0(08): 1-4.
[2]	CHEN Zi-jian, DUAN Chun-hong. Automatic Epistemic Emotion Recognition Based on Facial Expression in E-learning [J]. Computer and Modernization, 2023, 0(10): 92-98.
[3]	CUI Shao-guo, ZHANG Gang, WANG Ao-di. Deep Cross Network Recommendation Model Based on Attention Perception [J]. Computer and Modernization, 2023, 0(07): 54-60.
[4]	XU Hong-kui, ZHANG Zi-feng, LU Jiang-kun, ZHOU Jun-jie, HU Wen-ye, JIANG Tong-tong. Application of Hybrid CTC/Attention Model in Mandarin Recognition [J]. Computer and Modernization, 2022, 0(08): 1-6.
[5]	LU Yue, CAO Chun-ping. Microblog Rumor Detection Integrating User’s History and Dissemination Information [J]. Computer and Modernization, 2022, 0(06): 37-42.
[6]	ZHEN Chao, TIAN Yu, JI Kun, ZHANG Zheng-kai, HUANG Dao-you. Prediction of Gearbox Oil Temperature Based on FFT and DNN [J]. Computer and Modernization, 2022, 0(04): 17-20.
[7]	ZHANG Yun-yao, HUANG He-ming, ZHANG Hui-yun, . Speech Recognition in Complex Noise Environment [J]. Computer and Modernization, 2021, 0(09): 68-74.
[8]	WANG Yu-ying, WANG Yong. Recommendation Algorithm Based on Knowledge Graph and Bi-LSTM [J]. Computer and Modernization, 2021, 0(09): 90-98.
[9]	WANG Jian-hua, RAN Yu-kun. Eye-movement Tracking Based on Deep Neural Network for Portable Devices [J]. Computer and Modernization, 2021, 0(08): 58-63.
[10]	PENG Lu1, ZHU Jun2, ZOU Yun-feng2. Prediction of Power Customer Demands Based on Deep Neural Network [J]. Computer and Modernization, 2020, 0(05): 22-.
[11]	ZHANG Xue-xiang, LEI Ju-yang. Speaker Recognition Based on DNN and Pitch Period [J]. Computer and Modernization, 2020, 0(01): 122-.
[12]	QI Yu-dong1, DING Hai-qiang1, ZHAO Jin-chao2, SUN Ming-wei1. biRNN-based Method for Processing Unbalanced Text Data Sets of Naval Ordnance [J]. Computer and Modernization, 2019, 0(12): 21-.
[13]	YANG Yong-jiao, TANG Liang-liang. An Anomaly Detection Method for Network Traffic of Servers #br# in Smart Grid Based on Deep Encoder-Decoder Neural Network [J]. Computer and Modernization, 2019, 0(10): 66-.
[14]	GOU Xin-ke1，2，3， XU Gao-peng1，2，3. Research on Speech Recognition Robustness Based on Gabor Filtering [J]. Computer and Modernization, 2018, 0(05): 20-.
[15]	LI Ling-li. Environmental Sound Classification Based on MFCC-SVM and Cross Validation Method [J]. Computer and Modernization, 2016, 0(8): 36-39.