[1] GEHRING J, MIAO Y, METZE F, et al. Extracting deep bottleneck features using stacked auto-encoders[C]// IEEE International Conference on Acoustics,Speech, and Signal Processing. 2013:3377-3381.
[2] CAO M, WANG J Z, CAO J W, et al. Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM[C]// 2017 36th Chinese Control Conference. 2017.
[3] 蔡尚,金鑫,高圣翔,等. 用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数[J]. 声学学报, 2012,37(6):667-672.〖HJ1.5mm〗
[4] HASAN R, HUSSEIN H, LAZARIDIS P, et al. Improvement of speech recognition results by a combination of systems[C]// IEEE 23rd International Conference on Automation and Computing (ICAC). 2017:1-4.
[5] 王一,杨俊安,刘辉,等. 基于层次稀疏DBN的瓶颈特征提取方法[J]. 模式识别与人工智能, 2015,28(2):173-180.
[6] SWIETOJANSKI P, GHOSHAL A, RENALS S. Revisiting hybrid and GMM-HMM system combination techniques[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2013:6744-6748.
[7] ZHOU P, JIANG H, DAI L R, et al. State-clustering based multiple deep neural networks modeling approach for speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015,23(4):631-642.
[8] YU D, SEIDE F, LI G. Conversational speech transcription using context-dependent deep neural networks[C]// Proceedings of the 29th International Conference on Machine Learning. 2012:1-2.
[9]MARTENS J. Deep learning via Hessian-free optimization[C]// Proceedings of the 27th International Conference on Machine Learning. 2010:735-742.
[10]KINGSBURY B, SAINATH T N, SOLTAU H. Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization[C]// Proceedings of the 13th Annual Conference of the International Speech Communication Association. 2012:10-13.
[11]DAHL G E, ACERO A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio Speech & Language Processing, 2011,20(1):30-42.
[12]KINGSBURY B. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling[C]// Proceedings of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. 2009:3761-3764.
[13]MIAO Y J, METZE F. Improving language-universal feature extraction with deep maxout and convolutional neural networks[C]// Proceedings of the 15th Annual Conference of the International Speech Communication Association. 2014:800-804.
[14]SAINATH T N, VINYALS O, SENIOR A. Convolutional, long short-term memory, fully connected deep neural networks[C]// IEEE International Conference on Acoustics, Speech, and Signal Processing. 2015:4580-4584.
[15]张晴晴,刘勇,王智超,等. 卷积神经网络在语音识别中的应用[J]. 网络新媒体技术, 2014,3(6):39-42.
[16]RAO K, SENIOR A, SAK H. Flat start training of CD-CTCSMBR LSTM RNN acoustic models[C]// 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 2016:5405-5409.
[17]YI J Y, WEN Z Q, TAO J H, et al. CTC regularized model adaptation for Improving LSTM RNN based multi-accent mandarin speech recognition[J]. Journal of Signal Processing Systems, 2017,90(7):985-997.
[18]RAVANELLI M, BRAKEL P, OMOLOGO M, et al. Light gated recurrent units for speech recognition[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2018,2(2):92-102.
[19]ARISOY E, SETHY A, RAMABHADRAN B, et al. Bidirectional recurrent neural network language models for automatic speech recognition[C]// 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. 2015:5421-5425.
[20]ZEYER A, DOETSCH P, VOIGTLAENDER P, et al. Comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition[C]// 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2017.
[21]LUO X, ZHOU W W, WANG W P, et al. Attention-based relation extraction with bidirectional gated recurrent unit and highway network in the analysis of geological data[J]. IEEE Access, 2018,6(99):5705-5715.
[22]MIAO Y J, GOWAYYED M, METZE F. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding[C]// 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. 2015:167-174.
[23]WATANABE S, HORI T, KIM S, et al. Hybrid CTC/attention architecture for end-to-end speech recognition[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(8):1240-1253.
[24]CHOROWSKI J, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition[J]. Computer Science, 2015,10(4):429-439.
[25]DEL-AGUA M A, SANCHIS A, ALBERTO S, et al. Speaker-adapted confidence measures for ASR using deep bidirectional recurrent neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018,26(7):1198-1206.
[26]ZERARI N, ABDELHAMID S, BOUZGOU H, et.al. Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition[C]// 2018 2nd International Conference on Natural Language and Speech Processing. 2018:1-6.
[27]THANH T M, VAN T P, THANH H N. Improving phonetic recognition with sequence-length standardized MFCC features and deep bi-directional LSTM[C]//The 5th NAFOSTED Conference on Information and Computer Science. 2018:322-325.
[28]GOODFELLOW I, BENGIO Y, COURVILLE A. 深度学习[M]. 北京:人民邮电出版社, 2017:239-240. |