Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM

Abstract

Abstract: Aiming at the deficiencies of deep learning algorithms in the extraction of speech emotion features and the low recognition accuracy， the effective emotion features in the speech data are extracted, and the features are spliced and merged at multiple scales to construct speech emotion features and improve the deep learning model’s performance. Traditional recurrent neural networks cannot solve the long-term dependence problem of speech emotion recognition. The dual-layer LSTM model is used to improve the effect of speech emotion recognition， and a model combining hybrid multi-scale convolution and dual-layer LSTM model is proposed. Experimental results show that under the Chinese Emotion Database（CASIA） of the Institute of Automation of the Chinese Academy of Sciences and the Berlin Emotion Open Data Set（Emo-DB）， compared with other emotion recognition models， the speech emotion recognition model proposed in this article has a great improvement in accuracy.

Key words: speech emotion recognition, deep learning, neural network, multi-scale convolution, long and short time series network

LIANG Ke-jin, ZHANG Hai-jun, LIU Ya-qing, ZHANG Yu, WANG Yue-yang. Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM[J]. Computer and Modernization, 2023, 0(01): 63-68.

References

［1］高帆，张雪英，黄丽霞，等. 基于DBM-LSTM的多特征语音情感识别［J］. 计算机工程与设计， 2020，41（2）:465-470.
［2］黄永明，章国宝，李雄，等. 全局特征及弱尺度融合策略的小样本语音情感识别［J］. 声学学报， 2012，37（3）:330-338.
［3］张林娟. 面向语音情感识别的有效组合特征的分析与模型验证［D］. 天津:天津大学， 2018.
［4］刘明珠，李晓琴，陈洪恒. 基于支持向量机的语音情感识别算法研究［J］. 哈尔滨理工大学学报， 2019，24（4）:118-126.
［5］王胜. 基于隐马尔可夫模型的语音情感识别［J］. 黑龙江科技信息， 2010（28）:2.
［6］叶吉祥，涂晴宇. 基于重要性评分的多级随机森林网络语音情感识别［J］. 长沙理工大学学报（自然科学版）， 2019，16（3）:77-83.
［7］任浩，叶亮，李月，等. 基于多级SVM分类的语音情感识别算法［J］. 计算机应用研究， 2017，34（6）:1682-1684.
［8］吕惠炼，胡维平. 基于端到端深度神经网络的语音情感识别研究［J］. 广西师范大学学报（自然科学版）， 2021，39（3）:20-26.
［9］姜芃旭，傅洪亮，陶华伟. 一种基于卷积神经网络特征表征的语音情感识别方法［J］. 电子器件， 2019，42（4）:998-1001.
［10］刘芳，吴志威，杨安喆，等. 基于多尺度特征融合的自适应无人机目标检测［J］. 光学学报， 2020，40（10）：127-136.
［11］周悦，曾上游，杨远飞，等. 基于分组模块的卷积神经网络设计［J］. 微电子学与计算机， 2019，36（2）:68-72.
［12］张会云，黄鹤鸣. 基于异构并行神经网络的语音情感识别［J］. 计算机工程， 2022，48（4）:113-118.
［13］胡德生，张雪英，张静，等. 基于主辅网络特征融合的语音情感识别［J］. 太原理工大学学报， 2021，52（5）:769-774.
［14］王怡，王黎明，柴玉梅. 融合多特征的语音情感识别方法［J］. 小型微型计算机系统， 2022，43（6）:1232-1239.
［15］ YOON S， BYUN S， JUNG K. Multimodal speech emotion recognition using audio and text［C］// IEEE SLT 2018. 2018. DOI:10.1109/SLT.2018.8639583.
［16］陈永，郭红光，艾亚鹏. 基于多尺度卷积神经网络的单幅图像去雾方法［J］. 光学学报， 2019，39（10）：141-150.
［17］张威，翟明浩，黄子龙，等. SE-MCNN-CTC的中文语音识别声学模型［J］. 应用声学， 2020，39（2）:223-230.
［18］周晓云. 基于多尺度卷积神经网络的出行目的地预测技术研究［D］. 北京：北京邮电大学， 2019.
［19］李雁群. 中文嵌套命名实体识别及其关系抽取［D］. 苏州:苏州大学， 2018.
［20］周永生. 基于多尺度CNN特征的人体行为识别算法研究［D］. 重庆:西南大学， 2018.
［21］曾润华，张树群. 改进卷积神经网络的语音情感识别方法［J］. 应用科学学报， 2018，36（5）:837-844.
［22］缪裕青，邹巍，刘同来，等. 基于参数迁移和卷积循环神经网络的语音情感识别［J］. 计算机工程与应用， 2019，55（10）:135-140.
［23］冯天艺，杨震. 采用多任务学习和循环神经网络的语音情感识别算法［J］. 信号处理， 2019，35（7）:1133-1140.
［24］ CHEN M Y， HE X J， YANG J， et al. 3-D convolutional recurrent neural networks with attention model for speech emotion recognition［J］. IEEE Signal Processing Letters， 2018，25（10）：1440-1444.
［25］ JERMSITTIPARSERT K， ABDURRAHMAN A， SIRIAT-
TAKUL P， et al. Pattern recognition and features selection for speech emotion recognition model using deep learning［J］. International Journal of Speech Technology，2020，23（4）：799-806.
［26］ ZSEVEN T. A novel feature selection method for speech emotion recognition［J］. Applied Acoustics， 2019，146（146）：320-326.
［27］乔栋，陈章进，邓良，等. 改进语音处理的卷积神经网络中文语音情感识别［J］. 计算机工程， 2022，48（2）:281-290.

[1]	HE Sida, CHEN Pinghua. Intent-based Lightweight Self-Attention Network for Sequential Recommendation [J]. Computer and Modernization, 2024, 0(12): 1-9.
[2]	ZHANG Xiaodong1, BAI Guangzhi1, LI Min1, LI Haoyang2. Oil and Gas Well Production Prediction Model Based on Empirical Wavelet Transform [J]. Computer and Modernization, 2024, 0(12): 53-58.
[3]	LIU Baobao, YANG Jingjing, TAO Lu, WANG Heying . DSMSC Based on Attention Mechanism for Remote Sensing Image Scene Classification [J]. Computer and Modernization, 2024, 0(12): 72-77.
[4]	CHEN Liang, LI Cheng, YI Wei, XIONG Wei, WANG Xiaofan, TANG Haidong. Helmet Wearing Detection in Electric Power Field Based on#br# Millimeter-wave Radar and Visual Fusion [J]. Computer and Modernization, 2024, 0(12): 100-107.
[5]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[6]	CHEN Kai1, LI Yiting1, 2, QUAN Huafeng1. A River Discarded Bottles Detection Method Based on Improved YOLOv8 [J]. Computer and Modernization, 2024, 0(11): 113-120.
[7]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[8]	WANG Yingying, HAO Xiao. Fine-grained Image Classification Based on Res2Net and Recursive Gated Convolution [J]. Computer and Modernization, 2024, 0(10): 74-79.
[9]	SHI Xingyu1, LI Qiang2, ZHUANG Li3, LIANG Yi3, WANG Qiulin3, CHEN Kai3, WU Chenzhou3, CHANG Sheng1. Object Detection Models Distillation Technique for Industrial Deployment [J]. Computer and Modernization, 2024, 0(10): 93-99.
[10]	MA Yu, YANG Yong, REN Ge, Palidan Tuerxun. Automated Essay Scoring Method Based on GCN and Fine Tuned BERT [J]. Computer and Modernization, 2024, 0(09): 33-37.
[11]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[12]	CHENG Yazi1, LEI Liang1, 2, CHEN Han1, ZHAO Yiran1. Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention [J]. Computer and Modernization, 2024, 0(09): 121-126.
[13]	CHENG Meng, LI Hao. Improved Deciduous Tree Nest Detection Method Based on YOLOv5s [J]. Computer and Modernization, 2024, 0(08): 24-29.
[14]	WANG Mengxi, LI Jun. Review of Fall Detection Technologies for Elderly [J]. Computer and Modernization, 2024, 0(08): 30-36.
[15]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.

Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments