[1] RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989,77(2):257-286.
[2] RABINER L R, JUANG B H. Hidden Markov models for speech recognition-strengths and limitations[M]// Speech Recognition and Understanding. 1992:3-29.
[3] RODRGUEZ E, RUZ B, GARCA-CRESPO , et al. Speech/speaker recognition using a HMM/GMM hybrid model[C]// International Conference on Audio- and Video-Based Biometric Person Authentication. 1997: 227-234.
[4] AMODEI D, ANANTHANARAYANAN S, ANUBHAI R, et al. Deep speech 2: End-to-end speech recognition in English and Mandarin[C]// International Conference on Machine Learning. 2016:173-182.
[5] GRAVES A, FERNNDEZ S, GOMEZ F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 369-376.
[6] WATANABE S, HORI T, KIM S, et al. Hybrid CTC/Attention architecture for end-to-end speech recognition[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(8):1240-1253.
[7] CHOROWSKI J, BA HDANAU D, SERDYUK D, et al. Attention-based models for speech recognition[C]// Proceedings of the 28th International Conference on Nearal Information Processing System. 2015:577-585.
[8] 鱼昆,张绍阳,侯佳正,等. 语音识别及端到端技术现状及展望[J]. 计算机系统应用, 2021,30(3):14-23.
[9] 戴礼荣,张仕良,黄智颖. 基于深度学习的语音识别技术现状与展望[J]. 数据采集与处理, 2017, 32(2): 221-231.
[10]KIM C, KIM S, KIM K, et al. End-to-end training of a large vocabulary end-to-end speech recognition system[C]// 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019:562-569.
[11]杨威,胡燕. 混合CTC/attention架构端到端带口音普通话识别[J]. 计算机应用研究, 2021,38(3):755-759.
[12]刘加. 汉语大词汇量连续语音识别系统研究进展[J]. 电子学报, 2000, 28(1): 85-91.
[13]YU D, DENG L. Deep learning and its applications to signal and information processing [exploratory DSP][J]. IEEE Signal Processing Magazine, 2010, 28(1):145-154.
[14]GEIGER J T, ZHANG Z, WENINGER F, et al. Robust speech recognition using long short-term memory recurrent neural networks for Hybrid acoustic modelling [C]// The 15th Annual Conference of the International Speech Communication Association. 2014:631-635.
[15]张瑞珍,韩跃平,张晓通. 基于深度LSTM的端到端的语音识别[J]. 中北大学学报(自然科学版), 2020,41(3):244-248.
[16]姚煜,RYAD C. 基于双向长短时记忆-联结时序分类和加权有限状态转换器的端到端中文语音识别系统[J]. 计算机应用, 2018,38(9):2495-2499.
[17]杨德举,马良荔,谭琳珊,等. 基于门控卷积网络与CTC的端到端语音识别[J]. 计算机工程与设计, 2020,41(9):2650-2654.
[18]张威,翟明浩,黄子龙,等. SE-MCNN-CTC的中文语音识别声学模型[J]. 应用声学, 2020,39(2):223-230.
[19]张宇,张鹏远,颜永红. 基于注意力 LSTM 和多任务学习的远场语音识别[J]. 清华大学学报 (自然科学版), 2018, 58(3): 249-253.
[20]刘晓峰,宋文爱,余本国,等. 基于注意力机制的大同方言语音翻译模型研究[J]. 中北大学学报(自然科学版), 2020, 41(3): 238-243.
[21]徐冬冬,蒋志翔. 基于HOPE-CTC的端到端语音识别[J]. 计算机工程与设计, 2021,42(2):462-467.
[22]洪青阳,李琳. 语音识别:原理与应用[M]. 北京:电子工业出版社, 2020:252-254.
[23]KIM S, HORI T, WATANABE S. Joint CTC-attention based end-to-end speech recognition using multi-task learning [C]// 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017:4835-4839.
[24]WATANABE S, HORI T, KARITA S, et al. ESPnet: End-to-end speech processing toolkit[J]// arXiv preprint arXiv:1804.00015, 2018.
|