计算机与现代化 ›› 2020, Vol. 0 ›› Issue (12): 83-89.

• 数据库与数据挖掘 • 上一篇    下一篇

基于优化残差网络的多模态音乐情感分类

  

  1. (河海大学计算机与信息学院,江苏南京211100)
  • 出版日期:2021-01-07 发布日期:2021-01-07
  • 作者简介:李晓双(1995—),男,山东枣庄人,硕士研究生,研究方向:音乐情感分析,推荐系统,E-mail: 290598477@qq.com; 通信作者:韩立新,男,教授,博士,研究方向:信息检索,模式识别,数据挖掘,E-mail: lixinhan2002@aliyun.com; 李景仙,女,博士研究生,研究方向:情感分析,推荐系统,E-mail: 415245727@qq.com; 周经纬,男,硕士研究生,研究方向:计算机视觉,目标跟踪,E-mail: 513189227@qq.com。

Multi-modal Music Emotion Classification Based on Optimized Residual Network

  1. (College of Computer and Information, Hohai University, Nanjing 211100, China)
  • Online:2021-01-07 Published:2021-01-07

摘要: 针对传统的音乐情感分类因特征提取困难而导致模型分类准确率不高和人工工作量大等问题,提出一种基于优化深度残差网络的多模态音乐情感分类方法。该方法首先利用多模态翻译将难以提取特征的音乐音频模态转换为易于操作的图像模态;同时在深度残差网络的基础上对网络输入层的卷积核大小和残差块的快捷连接进行优化改进,减少了信息流失,缩短了计算时间;此外,为了缓解Softmax分类器存在类内离散、类间聚集这一弊端,引入了Center loss函数的变体来提升Softmax分类函数的性能。实验结果表明了本文优化后的残差网络模型的有效性和鲁棒性,相比于原始的残差网络,其对音乐情感的分类准确率提升了4.27个百分点。


关键词: 情感识别, 模态翻译, 图像分类, 深度残差网络, 分类损失函数

Abstract: Aiming at the problems of traditional music sentiment classification due to the difficulty of feature extraction, the model classification accuracy is not high and the manual workload is large, this paper proposes a multi-modal music sentiment classification method based on an optimized deep residual network. This method first uses multi-modal translation to convert difficult-to-extract feature music audio modalities into easy-to-operate image modalities; at the same time, based on the deep residual network, the convolution kernel size of the network input layer and the speed of the residual block, the connection has been optimized and improved, which reduces the information loss and shortens the calculation time. In addition, in order to alleviate the shortcomings of Softmax classifiers such as intra-class dispersion and inter-class aggregation, this paper introduces a variant of the Center loss function to improve the Softmax classification function performance. The experimental results prove the effectiveness and robustness of the optimized residual network model in this paper. Compared with the original residual network, the accuracy rate of music emotion classification is improved by 4.27 percentage points.

Key words: emotion recognition, modal translation, image classification, depth residual network, classification loss function