Lightweight Speech Emotion Recognition for Data Enhancement
(1. School of Electronics and Information, Xi’an Polytechnic University, Xi’an 710048, China;
2. School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China)
[1] HU H, XU M X, WU W. GMM supervector based SVM with spectral features for speech emotion recognition[C]// Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. 2007,4:413-416.
[2] VLASSIS N, LIKAS A. A greedy EM algorithm for Gaussian mixture learning[J]. Neural Processing Letters, 2002,15(1):77-87.
[3] RAGHAVAN A, DI TROIA F, STAMP M. Hidden Markov models with random restarts versus boosting for malware detection[J]. Journal of Computer Virology and Hacking Techniques, 2019,15(2):97-107.
[4] JIANG N, LIU T. Research on voiceprint recognition of camouflage voice based on deep belief network[J]. International Journal of Automation and Computing, 2021,18(6):947-962.
[5] 陈文兵,管正雄,陈允杰. 基于条件生成式对抗网络的数据增强方法[J]. 计算机应用, 2018,38(11):3305-3311.
[6] 张一珂,张鹏远,颜永红. 基于对抗训练策略的语言模型数据增强技术[J]. 自动化学报, 2018,44(5):891-900.
[7] 牛亚峰. 基于深度学习的语音情感识别研究[D]. 重庆:重庆大学, 2018.
[8] MAO Q R, DONG M, HUANG Z W, et al. Learning salient features for speech emotion recognition using convolutional neural networks[J]. IEEE Transactions on Multimedia, 2014,16(8):2203-2213.
[9] LEE J, TASHEV I. High-level feature representation using recurrent neural network for speech emotion recognition[C]// Proceedings of the 2015 Conference on Interspeech. 2015:1537-1540.
[10] VERKHOLYAK O V, KAYA H, KARPOV A A. Modeling short-term and long-term dependencies of the speech signal for paralinguistic emotion classification[J]. SPIIRAS Proceedings, 2019,18(1):30-56.
[11] YU H L, JI Y N, LI Q L. Student sentiment classification model based on GRU neural network and TF-IDF algorithm[J]. Journal of Intelligent and Fuzzy Systems, 2021,40(2):2301-2311.
[12] 余莉萍,梁镇麟,梁瑞宇. 基于改进LSTM的儿童语音情感识别模型[J]. 计算机工程, 2020,46(6):40-49.
[13] HAN S Q, LENG F, JIN Z T. Speech emotion recognition with a ResNet-CNN-Transformer parallel neural network[C]// Proceedings of the 2021 International Conference on Communications, Information System and Computer Engineering. 2021:803-807.
[14] 王光宇,张海涛. 轻量型图像分类神经网络改进研究[J]. 计算机应用研究, 2021,38(12):3808-3813.
[15] 周宇,曹英楠,王永超. 面向大数据的数据处理与分析算法综述[J]. 南京航空航天大学学报, 2021,53(5):664-676.
[16] 周迅,张晓龙. 基于双重数据增强策略的音频分类方法[J]. 武汉科技大学学报, 2020,43(2):155-160.
[17] KOO H, JEONG S, YOON S, et al. Development of speech emotion recognition algorithm using MFCC and prosody[C]// Proceedings of the 2020 International Conference on Electronics, Information, and Communication. 2020. DOI: 10.1109/ICEIC49074.2020.9051281.
[18] SHIMIZU T, ONAGA H. Study on acoustic improvements by sound-absorbing panels and acoustical quality assessment of teleconference systems[J]. Applied Acoustics, 2018,139:101-112.
[19] BENESTY J, SONDHI M M, HUANG Y T A. Springer Handbook of Speech Processing[M]. Berlin: Springer-Verlag, 2008.
[20] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[21] SHEWALKAR A, NYAVANANDI D, LUDWIG S A. Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU[J]. Journal of Artificial Intelligence and Soft Computing Research, 2019,9(4):235-245.
[22] LIVINGSTONE S R, RUSSO F A. The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English[J]. PLoS One, 2018,13(5). DOI: 10.1371/journal.pone.0196391.
[23] 韩文静,李海峰,阮华斌,等. 语音情感识别研究进展综述[J]. 软件学报, 2014,25(1):37-50.
[24] SHEN B X, WANG K Y, ZHOU J H. Design of a pitch detection and intonation correction system based on LabVIEW[J]. Journal of Computers, 2021,32(2):222-232.
[25] 许雪琼,余小清,李昌莲,等. 改进波形相似叠加算法的音频时长调整[J]. 应用科学学报, 2009,27(5):514-519.
[26] 乔栋,陈章进,邓良,等. 基于改进语音处理的卷积神经网络中文语音情感识别方法[J]. 计算机工程, 2022,48(2):281-290.
[27] 缪裕青,邹巍,刘同来,等. 基于参数迁移和卷积循环神经网络的语音情感识别[J]. 计算机工程与应用, 2019,55(10):135-140.
[28] ZHANG H Y, HUANG H M, HAN H. A novel heterogeneous parallel convolution Bi-LSTM for speech emotion recognition[J]. Applied Sciences, 2021,11(21). DOI: 10.3390/app11219897.
[29] 冯天艺,杨震. 采用多任务学习和循环神经网络的语音情感识别算法[J]. 信号处理, 2019,35(7):1133-1140.
[30] 焦亚萌,周成智,李文萍,等. 融合多头注意力的VGGNet语音情感识别研究[J]. 国外电子测量技术, 2022,41(1):63-69.
[31] 郑艳,陈家楠,吴凡,等. 基于CGRU模型的语音情感识别研究与实现[J]. 东北大学学报(自然科学版), 2020,41(12):1680-1685.
[32] Mustaqeem, KWON S. CLSTM: Deep feature-based speech emotion recognition using the hierarchical ConvLSTM network[J]. Mathematics, 2020,8(12). DOI: 10.3390/math8122133.