Multi-modal Music Emotion Classification Based on Optimized Residual Network

Abstract

Abstract: Aiming at the problems of traditional music sentiment classification due to the difficulty of feature extraction, the model classification accuracy is not high and the manual workload is large, this paper proposes a multi-modal music sentiment classification method based on an optimized deep residual network. This method first uses multi-modal translation to convert difficult-to-extract feature music audio modalities into easy-to-operate image modalities; at the same time, based on the deep residual network, the convolution kernel size of the network input layer and the speed of the residual block, the connection has been optimized and improved, which reduces the information loss and shortens the calculation time. In addition, in order to alleviate the shortcomings of Softmax classifiers such as intra-class dispersion and inter-class aggregation, this paper introduces a variant of the Center loss function to improve the Softmax classification function performance. The experimental results prove the effectiveness and robustness of the optimized residual network model in this paper. Compared with the original residual network, the accuracy rate of music emotion classification is improved by 4.27 percentage points.

Key words: emotion recognition, modal translation, image classification, depth residual network, classification loss function

LI Xiao-shuang, HAN Li-xin, LI Jing-xian, ZHOU Jing-wei. Multi-modal Music Emotion Classification Based on Optimized Residual Network[J]. Computer and Modernization, 2020, 0(12): 83-89.

References

［1］ CHANGIZI M. Why does music make us feel［N］. Scientific American, 2009-09-15.
［2］ EEROLA T, VUOSKOSKI J K. A comparison of the discrete and dimensional models of emotion in music［J］. Psychology of Music, 2011,39(1):18-49.
［3］ PATRA B G, DAS D, BANDYOPADHYAY S. Unsupervised approach to hindi music mood classification［M］// Mining Intelligence and Knowledge Exploration. Springer International Publishing, 2013:62-69.
［4］邵曦,陶凯云. 基于音乐内容和歌词的音乐情感分类研究［J］. 计算机技术与发展, 2015,25(8):184-187.
［5］王瑶,徐昌,舒福舟. 基于SVM算法的两种特征提取的图像分类方法分析［J］. 电脑与信息技术, 2019,27(6):18-20.
［6］刘华祠. 基于传统机器学习与深度学习的图像分类算法对比分析［J］. 电脑与信息技术, 2019,27(5):12-15.
［7］ HAN K, YU D, TASHEV I. Speech emotion recognition using deep neural network and extreme learning machine［C］// Conference of the International Speech Communication Association(INTERSPEECH 2014). 2014:223-227.
［8］ LAURIER C, GRIVOLLA J, HERRERA P. Multimodal music mood classification using audio and lyrics［C］// International Conference on Machine Learning and Applications. 2008:688-693.
［9］ HU X, DOWNIE J S. Improving mood classification in music digital libraries by combining lyrics and audio［C］// ACM/IEEE Joint Conference on Digital Libraries. 2010:159-168.
［10］DEFFERRARD M, BENZI K, VANDERGHEYNST P, et al. FMA: A dataset for music analysis［J］. Sound, 2016:arXiv:1612.01840.
［11］BALTRUSAITIS T, AHUJA C, MORENCY L. Multimodal machine learning: A survey and taxonomy［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,41(2):423-443.
［12］陈青,龚乾,张鸣. 基于语谱图的声乐分析［J］. 微计算机信息, 2010,26(21):6-8.
［13］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Computer Vision and Pattern Recognition(CVPR 2016). 2016:770-778.
［14］HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks［C］// European Conference on Computer Vision. 2016:630-645.
［15］XIE S N, GIRSHICK R, DOLLAR P, et al. Aggregated residual transformations for deep neural networks［C］// Computer Vision and Pattern Recognition(CVPR 2017). 2017:5987-5995.
［16］HUANG G, LIU Z, DER MAATEN L V, et al. Densely connected convolutional networks［C］// Computer Vision and Pattern Recognition(CVPR 2017). 2017:2261-2269.
［17］VEIT A, WILBER M J, BELONGIE S. Residual networks behave like ensembles of relatively shallow networks［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016:550-558.
［18］WANG J, LIU X, CHEN Y J, et al. Filtering normal papanicolaou smear with multi-instance learning［C］// IEEE International Conference on Signal and Image Processing. 2016:113-117.
［19］LIU W Y, WEN Y D, YU Z D, et al. SphereFace: Deep hypersphere embedding for face recognition［C］// Computer Vision and Pattern Recognition(CVPR 2017). 2017:6738-6746.
［20］SUN Y F, CHENG C M, ZHANG Y H, et al. Circle Loss: A unified perspective of pair similarity optimization［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2020). 2020:6397-6406.
［21］汤凯,何庆,赵群,等. 基于改进的深度残差网络的图像识别［J］. 南京师大学报(自然科学版), 2019,42(3):115-121.
［22］冯超. 深度学习轻松学：核心算法与视觉实践［M］. 北京:电子工业出版社, 2017.
［23］贺伟,姚娅川,彭彩平. 一种基于BoW模型的图像分类方法研究［J］. 科技创新与应用, 2017(10):45.

[1]	HUANG Tingpei1, MA Lubiao1, LI Shibao2, LIU Jianhang1. Gesture Recognition Method Based on WiFi and Prototypical Network [J]. Computer and Modernization, 2024, 0(12): 34-39.
[2]	ZHAO Xiaoming, PAN Ting, LIU Weifeng. Automated Drawing Psychoanalysis Based on Image Classification [J]. Computer and Modernization, 2024, 0(08): 92-97.
[3]	GAO Shuaipeng, WANG Yifan. Survey on Group-level Emotion Recognition in Images [J]. Computer and Modernization, 2024, 0(08): 98-107.
[4]	XU Yue-wen1, LI Ming1, LI Li2. Image Classification of COVID-19 Based on Contrast Learning MocoV2 [J]. Computer and Modernization, 2024, 0(02): 81-87.
[5]	QIU Kai-xing, FENG Guang. A Multi-label Image Classification Model Based on Dual Feature Attention [J]. Computer and Modernization, 2023, 0(12): 41-47.
[6]	WU Tian, LIU Hai-hua, TONG Shun-yan. Image Classification Based on Deep Feedback CNN [J]. Computer and Modernization, 2023, 0(09): 82-86.
[7]	MA Guo-xiang, YANG Ling-fei, YAN Chuan-bo, ZHANG Zhi-hao, SUN Bing, WANG Xiao-rong. Ultrasonic Image Diagnosis of Hepatic Echinococcosis Based on Deep DenseNet Network [J]. Computer and Modernization, 2023, 0(09): 100-104.
[8]	QIN Zhu-yuan, WU Hao-zhong, TAN Dai-qing, HAN Ai-qing, ZANG Hao, WANG Xuan, TANG Yan. Fine-grained Identification of Maidong Based on Multi-scale ResNet Combining Attention Mechanism [J]. Computer and Modernization, 2023, 0(07): 105-111.
[9]	ZHU Jian-bo, GE Ming-feng, DONG Wen-fei. Alzheimer’s Disease Image Classification Based on Improved EfficientNet [J]. Computer and Modernization, 2023, 0(06): 56-61.
[10]	CUI Chen-lu, CUI Lin, . Lightweight Speech Emotion Recognition for Data Enhancement [J]. Computer and Modernization, 2023, 0(04): 83-89.
[11]	LIANG Ke-jin, ZHANG Hai-jun, LIU Ya-qing, ZHANG Yu, WANG Yue-yang. Speech Emotion Recognition of Hybrid Multi-scale Convolution Combined with Dual-layer LSTM [J]. Computer and Modernization, 2023, 0(01): 63-68.
[12]	WU Zhi-ping, MA Yao-bin, TANG Wen-chao, HU Bi-wei, HU Bi-wei, LIU Ming-jia. Multispectral Image Classification Based on Context-aware and Super-pixel Post-processing [J]. Computer and Modernization, 2022, 0(12): 67-73.
[13]	HUANG Yan-hui, LAN Hai, WEI Xian. Lightweight Vision Transformer Based on Separable Structured Transformations [J]. Computer and Modernization, 2022, 0(10): 75-81.
[14]	YANG Zhen, SHAN Meng-jiao, YIN Zhi-jian, YANG Fan, LI Cui-mei. Fine-grained Image Classification via Channel Adaptive Discriminative Learning [J]. Computer and Modernization, 2022, 0(10): 68-74.
[15]	WANG Pan-hong, ZHU Chang-ming. Multi-label Image Classification Method Combined CNN and Interactive Features [J]. Computer and Modernization, 2022, 0(09): 85-92.