Speech Recognition in Complex Noise Environment

Abstract

Abstract: Speech recognition is an important way of human-computer interaction. Aiming at the poor performance of traditional speech recognition systems for noisy speech recognition and inappropriate feature selection, a deep autoencoder recurrent neural network model based on transfer learning is proposed. The model consists of encoder, decoder and acoustic model. Among them, the acoustic model is composed of stack bidirectional recurrent neural network, which is used to improve the recognition performance. The encoder and decoder are composed of full connected layers for feature extraction. The structure and parameters of the encoder are transferred to the acoustic model for joint training, the experimental results on noisy Google commands dataset show that the proposed model can effectively enhance the recognition performance of noisy speech and has good robustness and generalization.

Key words: speech recognition, transfer learning, auto-encoder, joint training

ZHANG Yun-yao, HUANG He-ming, ZHANG Hui-yun, . Speech Recognition in Complex Noise Environment[J]. Computer and Modernization, 2021, 0(09): 68-74.

References

［1］刘伟波,曾庆宁,罗瀛,等. 低信噪比环境下语音识别的鲁棒性方法研究［J］. 声学技术, 2019,38(6):650-656.
［2］侯一民,周慧琼,王政一. 深度学习在语音识别中的研究进展综述［J］. 计算机应用研究, 2017,34(8):2241-2246.
［3］许春冬,许瑞龙,周静. 基于自动编码生成对抗网络的语音增强算法［J］. 计算机工程与设计, 2019,40(9):2578-2583.
［4］易江燕,陶建华,刘斌,等. 基于迁移学习的噪声鲁棒语音识别声学建模［J］. 清华大学学报(自然科学版), 2018,58(1):55-60.
［5］ KIM J, EL-KHAMY M, LEE J. Bridgenets: Student-teacher transfer learning based on recursive neural networks and its application to distant speech recognition［C］// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. 2018:5719-5723.
［6］王俊超,黄浩,徐海华,等. 基于迁移学习的低资源度维吾尔语语音识别［J］. 计算机工程, 2018,44(10):281-285.
［7］秦晨光,王海,任杰,等. 基于多任务学习的方言语种识别［J］. 计算机研究与发展, 2019,56(12):2632-2640.
［8］ ZHANG H Y, LIU C G, INOUE N, et al. Multi-task autoencoder for noise-robust speech recognition［C］// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. 2018:5599-5603.
［9］李鹏,杨元维,高贤君,等. 基于双向循环神经网络的汉语语音识别［J］. 应用声学, 2020,39(3):464-471.
［10］ZHANG S C, DO C T, DODDIPATLA R, et al. Learning noise invariant features through transfer learning for robust end-to-end speech recognition［C］// 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. 2020:7024-7028.
［11］李云红,梁思程,贾凯莉,等. 一种改进的DNN-HMM的语音识别方法［J］. 应用声学, 2019,38(3):371-377.
［12］袁非牛,章琳,史劲亭,等. 自编码神经网络理论及应用综述［J］. 计算机学报, 2019,42(1):203-230.
［13］赵淑芳,董小雨. 基于改进的LSTM深度神经网络语音识别研究［J］. 郑州大学学报(工学版), 2018,39(5):63-67.
［14］舒帆,屈丹,张文林,等. 采用长短时记忆网络的低资源语音识别方法［J］. 西安交通大学学报, 2017,51(10):120-127.
［15］白雅雯,古丽拉· 阿东别克. 基于转移的神经网络哈萨克语句法分析［J］. 计算机工程与应用, 2019,55(24):159-163.
［16］傅依娴,芦天亮,马泽良. 基于One-Hot的CNN恶意代码检测技术［J］. 计算机应用与软件, 2020,37(1):304-308.
［17］周文,张世琨,丁勇,等. 面向低维工控网数据集的对抗样本攻击分析［J］. 计算机研究与发展, 2020,57(4):736-745.
［18］聂凡杰. 基于端到端的深度学习目标检测算法研究［D］. 北京:北京邮电大学, 2018.
［19］侯一民,李永平. 基于卷积神经网络的孤立词语音识别［J］. 计算机工程与设计, 2019,40(6):1751-1756.
［20］张钰莎,蒋盛益. 基于MFCC特征提取和改进SVM的语音情感数据挖掘分类识别方法研究［J］. 计算机应用与软件, 2020,37(8):160-165.
［21］史燕燕. 面向语音识别的抗噪听觉特征提取及优化［D］. 太原:太原理工大学, 2019.
［22］叶硕. 复杂噪声环境下语音识别研究［D］. 武汉:武汉邮电科学研究院, 2020.
［23］高扬. 耳蜗滤波器倒谱特征在语音识别中的应用［D］. 太原:太原理工大学, 2011.

[1]	WANG Haiyang, GONG Tongxin, YANG Jintao, CHEN Zailong. Short-term Load Forecasting in Industrial Parks with Multi-scale Time Coding [J]. Computer and Modernization, 2024, 0(12): 59-65.
[2]	HE Ruonan1, FAN Xiang2, CHEN Yi1, JIANG Yufei1, CAO Hui1. Proportional Dominance Logistic Regression Optimized Voice Disorder Index Algorithm [J]. Computer and Modernization, 2024, 0(08): 1-4.
[3]	MA Yong, WANG Jun, ZHANG Zijian, ZHAO Yuyang, ZHANG Jing, ZHOU Ming. Improved YOLOv8 Behavior Detection Algorithm for Intelligent Operation and#br# Maintenance System [J]. Computer and Modernization, 2024, 0(08): 43-48.
[4]	XIE Guobo, LUO Canjie, LIN Zhiyi, JIANG Zelin. Structural Attention Mechanism Auto-encoder for miRNA-disease Association Prediction [J]. Computer and Modernization, 2024, 0(04): 107-114.
[5]	HU Mei-chen1, 2, LIU Dun-long1, 2, SANG Xue-jia1, 2, ZHANG Shao-jie3, CHEN Qiao4. Intelligent Identification Method of Debris Flow Scene Based on Camera Video Surveillance [J]. Computer and Modernization, 2024, 0(03): 41-46.
[6]	ZENG Zhong-jing-xin, GAN Gang. Side Channel Analysis Based on Convolutional Auto-encoder [J]. Computer and Modernization, 2024, 0(03): 110-114.
[7]	HU Chong-jia, LIU Jin-zhou, FANG Li. Unsupervised Domain Adaptation for Outdoor Point Cloud Semantic Segmentation [J]. Computer and Modernization, 2024, 0(01): 74-79.
[8]	ZHANG Zhi-xia, XIE Bao-qiang. Natural Gas Load Forecasting Based on FCGA-LSTM and Transfer Learning [J]. Computer and Modernization, 2023, 0(07): 7-12.
[9]	YANG Jun, WANG Jin-lin, NI Hong, SHENG Yi-qiang, . Dynamic Transfer Method Based on Sensitivity in Industrial Control Network Anomaly Detection [J]. Computer and Modernization, 2023, 0(05): 46-51.
[10]	CHEN Xiao-wen, SHI Hui. A Digital Watermarking Detection Model Based on DWT-SVD and Transfer Learning [J]. Computer and Modernization, 2023, 0(04): 111-117.
[11]	BAI Xu-guang, LIU Cheng-zhong, HAN Jun-ying, GAO Jia-meng, CHEN Jun-kang. Classification Method of Small Sample Apple Leaves Based on SE-ResNeXt [J]. Computer and Modernization, 2023, 0(01): 18-23.
[12]	SONG Xiao-li, ZHANG Yong-bo, ZHANG Pei-ying. Anomaly Detection of Student Consumption Data Based on Semi-supervised Learning [J]. Computer and Modernization, 2022, 0(12): 13-17.
[13]	XU Hong-kui, ZHANG Zi-feng, LU Jiang-kun, ZHOU Jun-jie, HU Wen-ye, JIANG Tong-tong. Application of Hybrid CTC/Attention Model in Mandarin Recognition [J]. Computer and Modernization, 2022, 0(08): 1-6.
[14]	HE Li-wen, ZHANG Rui-chi. Driver Distracted Behavior Recognition Based on Deep Learning [J]. Computer and Modernization, 2022, 0(06): 67-74.
[15]	XU Ya-jun, GUO En-hao, CHEN Lin, SI Cheng-ke. A Data-driven Deep Modulation Identification Method for RF Signals [J]. Computer and Modernization, 2022, 0(06): 80-86.