Environmental Sound Recognition Based on Feature Fusion and Improved Convolution Neural Network

Abstract

Abstract: Environmental sound recognition is a challenging problem due to the complex structure of environmental sounds. An environmental sound recognition method of combining feature fusion with improved convolutional neural network algorithm is proposed. Firstly, for the original audio file, the features learned from waveform and traditional audio features are extracted, which are MFCC (Mel-Frequency Cepstral Coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), spectral contrast and CQT (Constant Q-transform). Then, the extracted features are respectively input into end-to-end neural network SF-CNN and multi-scale convolution neural network MS-CNN for recognition. Finally, the decision-level fusion is carried out according to the D-S evidence theory decision rule, and the final recognition result is output. Experimental results over public dataset ESC-50 show that the proposed model can achieve higher recognition accuracy, it is superior to methods based on a single feature, and is more suitable for complex acoustic scenes.

Key words: environmental sound recognition, feature fusion, multi-scale convolution operation, D-S evidence theory

XU Rui, LI Zhi-hua, HAN Can-can. Environmental Sound Recognition Based on Feature Fusion and Improved Convolution Neural Network [J]. Computer and Modernization, 2021, 0(02): 62-67.

References

［1］ ALAS F, SOCOR J C, SEVILLANO X. A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds［J］. Applied Sciences, 2016,6(5):143.
［2］ LOPATKA K, ZWAN P, CZY〖KG-1mm〗Z〖DD(-*3〗·〖DD）〗EWSKI A. Dangerous sound event recognition using support vector machine classifiers［C］// Advances in Multimedia and Network Information System Technologies. 2010;49-57.
［3］ MYDLARZ C, SALAMON J, BELLO J P. The implementation of low-cost urban acoustic monitoring devices［J］. Applied Acoustics, 2017,117(B):207-218.
［4］ YU C Y, LIU H, QI Z M. Sound Event Detection Using Deep Random Forest［R］. DCASE2017 Challenge, 2017.
［5］ ABDEL-HAMID O, MOHAMED A R, JANG H, et al. Convolutional neural networks for speech recognition ［J］. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,22(10):1533-1545.
［6］ SAINATH T N, MOHAMED A R, KINGSBURY B, et al. Deep convolutional neural networks for LVCSR［C］// 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013:8614-8618.
［7］ KIM T, LEE J, NAM J. Sample-level CNN architectures for music auto-tagging using raw waveforms［C］// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. 2018:366-370.
［8］ YU L C, YANG Y H, HUNG Y N, et al. Hit song prediction for pop music by Siamese CNN with ranking loss［J/OL］.［2017-10-30］(2020-06-01). http://arxiv.org/abs/1710.10814.
［9］ PICZAK K J. Environmental sound classification with convolutional neural networks［C］// 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing. 2015:1-6.
［10］SALAMON J, BELLO J P. Deep convolutional neural networks and data augmentation for environmental sound classification［J］. IEEE Signal Processing Letters, 2017,24(7):279-283.
［11］DAI W, DAI C, QU S H, et al. Very deep convolutional neural networks for raw waveforms［C］// 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. 2017:421-425.
［12］SAINATH T N, WEISS R J, SENIOR A, et al. Learning the speech front-end with raw waveform CLDNNs［C］// The 16th Annual Conference of the International Speech Communication Association. 2015:1-5.
［13］BURGOS W. Gammatone and MFCC features in speaker recognition［D］. Melbourne: Florida Institute of Technology, 2014.
［14］CHACHADA S, KUO C C J. Environmental sound recognition: A survey［C］// 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 2013:e14.
［15］JIANG D N, LU L, ZHANG H J, et al. Music type classification by spectral contrast feature［C］// Proceedings of IEEE International Conference on Multimedia and Expo. 2002:113-116.
［16］ZHU B Q, WANG C J, LIU F, et al. Learning environmental sounds with multi-scale convolutional neural network［C］// 2018 International Joint Conference on Neural Networks. 2018:1-8.
［17］SHAFER G. A Mathematical Theory of Evidence［M］. Princeton: Princeton University, 1976.
［18］鲁睿,张力. 基于D-S证据理论的传感器网络数据融合算法［J］. 信息通信, 2015(3):14.
［19］王凤利. 基于BP神经网络和DS证据理论的疾病预测模型研究［D］. 太原:太原理工大学, 2016.
［20］朱晓男. 无线传感器网络中基于神经网络的数据融合算法的研究［D］. 吉林:吉林大学, 2016.
［21］PICZAK K J. ESC: Dataset for environmental sound classification［C］// Proceedings of the 23rd ACM International Conference on Multimedia. 2015:1015-1018.
［22］TOKOZUME Y, HARADA T. Learning environmental sounds with end-to-end convolutional neural network［C］// 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. 2017:2721-2725.
［23］LI S B, YAO Y, HU J, et al. An ensemble stacked convolutional neural network model for environmental event sound recognition［J］. Applied Sciences, 2018,8:1152.
［24］AGRAWAL D, SAILOR M, SONI M, et al. Novel TEO-based Gammatone features for environmental sound classification［C］// 2017 25th European Signal Processing Conference. 2017:1809-1813.

[1]	ZHANG Simin, LIU Xinmei, YIN Junling, LI Baoling. PCB Defect Detection Method Based on Improved YOLOv7 [J]. Computer and Modernization, 2024, 0(12): 45-52.
[2]	WANG Haiyang, GONG Tongxin, YANG Jintao, CHEN Zailong. Short-term Load Forecasting in Industrial Parks with Multi-scale Time Coding [J]. Computer and Modernization, 2024, 0(12): 59-65.
[3]	MA Yu, YANG Yong, REN Ge, Palidan Tuerxun. Automated Essay Scoring Method Based on GCN and Fine Tuned BERT [J]. Computer and Modernization, 2024, 0(09): 33-37.
[4]	ZHENG Shangpo1, CHEN Defu1, LI Jianli2, LIN Guoxian2, WANG Xingping3. Pedestrian Tracking Algorithm Based on Improved YOLOv5s and DeepSORT [J]. Computer and Modernization, 2024, 0(08): 54-58.
[5]	PANG Mei, WANG Gong, ZHAN Yong, HUANG Zhefa. Underwater Trash Detection Method Based on Improved YOLOv5 [J]. Computer and Modernization, 2024, 0(07): 120-126.
[6]	FU Lingli, QIU Yu, ZHANG Xinchen. Retinal Vessel Segmentation Based on Improved U-Net with Multi-feature Fusion [J]. Computer and Modernization, 2024, 0(06): 76-82.
[7]	ZHU Fen, HE Lifeng, SUN Shuang, ZHANG Mengying, YU Jiajia. Pancreas Segmentation Model Based on Deformable Residual and Cascading Encoding [J]. Computer and Modernization, 2024, 0(06): 83-88.
[8]	WU Zhao-meng1, ZHANG Cheng-gang2. Unpaired Cross-modal Hashing Method for Web News Data [J]. Computer and Modernization, 2024, 0(03): 54-60.
[9]	NING Juan, ZHOU Qing-hua, ZENG Xiao-wei. Application of Improved YOLOv7 Algorithm in Detection of Capping Defects of Vials [J]. Computer and Modernization, 2023, 0(12): 82-86.
[10]	CHEN Jun-yi. Health Event Prediction Model Based on Dynamic and Static Features of Graph Nodes [J]. Computer and Modernization, 2023, 0(10): 39-44.
[11]	XING Shi-shuai, LIU Dan-feng, WANG Li-guo, PAN Yue-tao, MENG Ling-hong, YUE Xiao-han. Image Super-resolution Reconstruction Based on Spatial Attention Residual Network [J]. Computer and Modernization, 2023, 0(10): 45-52.
[12]	CHEN Jia-min, ZHANG Bo-quan, MAI Hai-peng. Hippocampus Segmentation Based on Feature Fusion [J]. Computer and Modernization, 2023, 0(08): 1-6.
[13]	WANG Hong, GE Hong. Cross Modal Hash Retrieval Based on Attention Mechanism and Semantic Similarity [J]. Computer and Modernization, 2023, 0(08): 44-53.
[14]	WANG Jie, PAN Feng, ZHANG Yan-sha, TAN Mian, YAN Xiao-bo, WANG Lin, . Surface Defect Classification of Aluminum Profiles with Weighted Non-local Modules [J]. Computer and Modernization, 2023, 0(05): 86-92.
[15]	ZHU Li-qing, LI Xiang, . Vehicle Detection of Remote Sensing Images Based on Improved YOLOv5 Algorithm [J]. Computer and Modernization, 2023, 0(05): 117-121.

Environmental Sound Recognition Based on Feature Fusion and Improved Convolution Neural Network

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments