Computer and Modernization ›› 2020, Vol. 0 ›› Issue (12): 83-89.

Previous Articles     Next Articles

Multi-modal Music Emotion Classification Based on Optimized Residual Network

  

  1. (College of Computer and Information, Hohai University, Nanjing 211100, China)
  • Online:2021-01-07 Published:2021-01-07

Abstract: Aiming at the problems of traditional music sentiment classification due to the difficulty of feature extraction, the model classification accuracy is not high and the manual workload is large, this paper proposes a multi-modal music sentiment classification method based on an optimized deep residual network. This method first uses multi-modal translation to convert difficult-to-extract feature music audio modalities into easy-to-operate image modalities; at the same time, based on the deep residual network, the convolution kernel size of the network input layer and the speed of the residual block, the connection has been optimized and improved, which reduces the information loss and shortens the calculation time. In addition, in order to alleviate the shortcomings of Softmax classifiers such as intra-class dispersion and inter-class aggregation, this paper introduces a variant of the Center loss function to improve the Softmax classification function performance. The experimental results prove the effectiveness and robustness of the optimized residual network model in this paper. Compared with the original residual network, the accuracy rate of music emotion classification is improved by 4.27 percentage points.

Key words: emotion recognition, modal translation, image classification, depth residual network, classification loss function