Computer and Modernization ›› 2023, Vol. 0 ›› Issue (01): 63-68.

Previous Articles     Next Articles

Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM

  

  1. (College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China)
  • Online:2023-03-02 Published:2023-03-02

Abstract: Aiming at the deficiencies of deep learning algorithms in the extraction of speech emotion features and the low recognition accuracy, the effective emotion features in the speech data are extracted, and the features are spliced and merged at multiple scales to construct speech emotion features and improve the deep learning model’s performance. Traditional recurrent neural networks cannot solve the long-term dependence problem of speech emotion recognition. The dual-layer LSTM model is used to improve the effect of speech emotion recognition, and a model combining hybrid multi-scale convolution and dual-layer LSTM model is proposed. Experimental results show that under the Chinese Emotion Database(CASIA) of the Institute of Automation of the Chinese Academy of Sciences and the Berlin Emotion Open Data Set(Emo-DB), compared with other emotion recognition models, the speech emotion recognition model proposed in this article has a great improvement in accuracy.

Key words: speech emotion recognition, deep learning, neural network, multi-scale convolution, long and short time series network