计算机与现代化 ›› 2023, Vol. 0 ›› Issue (01): 63-68.

• 人工智能 • 上一篇    下一篇

混合多尺度卷积结合双层LSTM语音情感识别

  

  1. (新疆师范大学计算机科学技术学院,新疆 乌鲁木齐  830054)
  • 出版日期:2023-03-02 发布日期:2023-03-02
  • 作者简介:梁科晋(1995—),男,山西晋城人,硕士研究生,研究方向:自然语言处理,情感倾向性分析,E-mail: 1762429844@qq.com; 通信作者: 张海军(1973—),男,吉林四平人,教授,硕士生导师,博士,研究方向:自然语言处理,情感计算,人工智能,E-mail: ustczhj@qq.com; 刘雅情(1996—),女,辽宁大连人,硕士研究生,研究方向:自然语言处理,E-mail: 1109701435@qq.comq; 张昱(1995—),女,陕西商洛人,硕士研究生,研究方向:自然语言处理,E-mail: 605178537@qq.com; 王月阳(1996—),男,河北沧州人,硕士研究生,研究方向:自然语言处理,E-mail: 1609166606@qq.com。
  • 基金资助:
    新疆维吾尔自治区创新环境建设专项(人才专项计划天山雪松计划)(2019XS08); 国家自然科学基金-新疆联合基金重点项目(U1703261)

Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM

  1. (College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China)
  • Online:2023-03-02 Published:2023-03-02

摘要: 针对深度学习算法在语音情感特征提取方面的不足以及识别准确率不高的问题,本文通过提取语音数据中有效的情感特征,并将特征进行多尺度拼接融合,构造语音情感特征,提高深度学习模型对特征的表现能力。传统递归神经网络无法解决语音情感识别长时依赖问题,本文采用双层LSTM模型来改进语音情感识别效果,提出一种混合多尺度卷积与双层LSTM模型相结合的模型。实验结果表明,在中科院自动化所汉语情感数据库(CASIA)和德国柏林情感公开数据集(Emo-DB)下,本文所提语音情感识别模型相较于其他情感识别模型在准确率方面有较大提高。

关键词: 语音情感识别, 深度学习, 神经网络, 多尺度卷积, 长短时序网络

Abstract: Aiming at the deficiencies of deep learning algorithms in the extraction of speech emotion features and the low recognition accuracy, the effective emotion features in the speech data are extracted, and the features are spliced and merged at multiple scales to construct speech emotion features and improve the deep learning model’s performance. Traditional recurrent neural networks cannot solve the long-term dependence problem of speech emotion recognition. The dual-layer LSTM model is used to improve the effect of speech emotion recognition, and a model combining hybrid multi-scale convolution and dual-layer LSTM model is proposed. Experimental results show that under the Chinese Emotion Database(CASIA) of the Institute of Automation of the Chinese Academy of Sciences and the Berlin Emotion Open Data Set(Emo-DB), compared with other emotion recognition models, the speech emotion recognition model proposed in this article has a great improvement in accuracy.

Key words: speech emotion recognition, deep learning, neural network, multi-scale convolution, long and short time series network