Computer and Modernization ›› 2021, Vol. 0 ›› Issue (02): 62-67.

Previous Articles     Next Articles

Environmental Sound Recognition Based on Feature Fusion and Improved Convolution Neural Network 

  

  1. (College of Energy and Electrical Engineering, Hohai University, Nanjing 211100, China)
  • Online:2021-03-01 Published:2021-03-01

Abstract: Environmental sound recognition is a challenging problem due to the complex structure of environmental sounds. An environmental sound recognition method of combining feature fusion with improved convolutional neural network algorithm is proposed. Firstly, for the original audio file, the features learned from waveform and traditional audio features are extracted, which are MFCC (Mel-Frequency Cepstral Coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), spectral contrast and CQT (Constant Q-transform). Then, the extracted features are respectively input into end-to-end neural network SF-CNN and multi-scale convolution neural network MS-CNN for recognition. Finally, the decision-level fusion is carried out according to the D-S evidence theory decision rule, and the final recognition result is output. Experimental results over public dataset ESC-50 show that the proposed model can achieve higher recognition accuracy, it is superior to methods based on a single feature, and is more suitable for complex acoustic scenes.

Key words: environmental sound recognition, feature fusion, multi-scale convolution operation, D-S evidence theory