Computer and Modernization ›› 2023, Vol. 0 ›› Issue (04): 83-89.

Previous Articles     Next Articles

Lightweight Speech Emotion Recognition for Data Enhancement

  

  1. (1. School of Electronics and Information, Xi’an Polytechnic University, Xi’an 710048, China;
    2. School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China)
  • Online:2023-05-09 Published:2023-05-09

Abstract: The use of deep learning for speech emotion recognition requires a large amount of training data. In this paper, the original speech is enhanced by adding Gaussian white noise and shifting the waveform to generate new speech signals in the preprocessing stage, which not only improves the recognition accuracy but also enhances the robustness of the model, given the shortage of existing speech emotion databases and the defects of overfitting caused by the small amount of data. At the same time, due to the excessive amount of parameters of the common convolutional neural network, a lightweight model is proposed, which consists of separable convolutional and gated recurrent units. Firstly, MFCC features are extracted from the original speech as the input of the model, and secondly, separable convolutional is used to extract the spatial information of speech, and gated recurrent units extract the temporal information of speech so that the temporal and spatial information can be used to characterize the speech emotion at the same time. It can make the prediction results more accurate. Finally, a fully connected layer with softmax is fed to complete the sentiment classification. The experimental results show that the model in this paper can not only obtain higher accuracy but also compress the model by about 50% compared with the benchmark model.

Key words: speech emotion recognition, data enhancement, Gaussian white noise, waveform displacement, number of parameters