[1] |
RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989,77(2):257-286.
|
[2] |
RABINER L R, JUANG B H. Hidden Markov models for speech recognition-strengths and limitations[M]// Speech Recognition and Understanding. 1992:3-29.
|
[3] |
RODRGUEZ E, RUZ B, GARCA-CRESPO , et al. Speech/speaker recognition using a HMM/GMM hybrid model[C]// International Conference on Audio- and Video-Based Biometric Person Authentication. 1997: 227-234.
|
[4] |
AMODEI D, ANANTHANARAYANAN S, ANUBHAI R, et al. Deep speech 2: End-to-end speech recognition in English and Mandarin[C]// International Conference on Machine Learning. 2016:173-182.
|
[5] |
GRAVES A, FERNNDEZ S, GOMEZ F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks[C]// Proceedings of the 23rd International Conference on Machine Learning. 2006: 369-376.
|
[6] |
WATANABE S, HORI T, KIM S, et al. Hybrid CTC/Attention architecture for end-to-end speech recognition[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(8):1240-1253.
|
[7] |
CHOROWSKI J, BA HDANAU D, SERDYUK D, et al. Attention-based models for speech recognition[C]// Proceedings of the 28th International Conference on Nearal Information Processing System. 2015:577-585.
|
[8] |
鱼昆,张绍阳,侯佳正,等. 语音识别及端到端技术现状及展望[J]. 计算机系统应用, 2021,30(3):14-23.
|
[9] |
戴礼荣,张仕良,黄智颖. 基于深度学习的语音识别技术现状与展望[J]. 数据采集与处理, 2017, 32(2): 221-231.
|
[10] |
KIM C, KIM S, KIM K, et al. End-to-end training of a large vocabulary end-to-end speech recognition system[C]// 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019:562-569.
|
[11] |
杨威,胡燕. 混合CTC/attention架构端到端带口音普通话识别[J]. 计算机应用研究, 2021,38(3):755-759.
|
[12] |
刘加. 汉语大词汇量连续语音识别系统研究进展[J]. 电子学报, 2000, 28(1): 85-91.
|
[13] |
YU D, DENG L. Deep learning and its applications to signal and information processing [exploratory DSP][J]. IEEE Signal Processing Magazine, 2010, 28(1):145-154.
|
[14] |
GEIGER J T, ZHANG Z, WENINGER F, et al. Robust speech recognition using long short-term memory recurrent neural networks for Hybrid acoustic modelling [C]// The 15th Annual Conference of the International Speech Communication Association. 2014:631-635.
|
[15] |
张瑞珍,韩跃平,张晓通. 基于深度LSTM的端到端的语音识别[J]. 中北大学学报(自然科学版), 2020,41(3):244-248.
|
[16] |
姚煜,RYAD C. 基于双向长短时记忆-联结时序分类和加权有限状态转换器的端到端中文语音识别系统[J]. 计算机应用, 2018,38(9):2495-2499.
|
[17] |
杨德举,马良荔,谭琳珊,等. 基于门控卷积网络与CTC的端到端语音识别[J]. 计算机工程与设计, 2020,41(9):2650-2654.
|
[18] |
张威,翟明浩,黄子龙,等. SE-MCNN-CTC的中文语音识别声学模型[J]. 应用声学, 2020,39(2):223-230.
|
[19] |
张宇,张鹏远,颜永红. 基于注意力 LSTM 和多任务学习的远场语音识别[J]. 清华大学学报 (自然科学版), 2018, 58(3): 249-253.
|
[20] |
刘晓峰,宋文爱,余本国,等. 基于注意力机制的大同方言语音翻译模型研究[J]. 中北大学学报(自然科学版), 2020, 41(3): 238-243.
|
[21] |
徐冬冬,蒋志翔. 基于HOPE-CTC的端到端语音识别[J]. 计算机工程与设计, 2021,42(2):462-467.
|
[22] |
洪青阳,李琳. 语音识别:原理与应用[M]. 北京:电子工业出版社, 2020:252-254.
|
[23] |
KIM S, HORI T, WATANABE S. Joint CTC-attention based end-to-end speech recognition using multi-task learning [C]// 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017:4835-4839.
|
[24] |
WATANABE S, HORI T, KARITA S, et al. ESPnet: End-to-end speech processing toolkit[J]// arXiv preprint arXiv:1804.00015, 2018.
|