计算机与现代化 ›› 2021, Vol. 0 ›› Issue (02): 62-67.

• 人工智能 • 上一篇    下一篇

基于特征融合和改进卷积神经网络的环境音识别

  

  1. (河海大学能源与电气学院,江苏南京211100)
  • 出版日期:2021-03-01 发布日期:2021-03-01
  • 作者简介:徐睿(1995—),女,江苏盐城人,硕士研究生,研究方向:机器学习与音频处理,E-mail: xurui95@163.com; 李志华(1964—),男,江苏泰州人,教授,硕士生导师,研究方向:人工智能与复杂系统故障诊断; 韩灿灿(1995—),女,安徽安庆人,硕士研究生,研究方向:机器学习与音频处理。
  • 基金资助:
    江苏省自然科学基金资助项目(BK20151500)

Environmental Sound Recognition Based on Feature Fusion and Improved Convolution Neural Network 

  1. (College of Energy and Electrical Engineering, Hohai University, Nanjing 211100, China)
  • Online:2021-03-01 Published:2021-03-01

摘要: 由于环境声音复杂的结构,环境声音识别是一个具有挑战性的问题。本文提出一种将特征融合与改进卷积神经网络算法相结合的环境音识别方法。首先针对原始音频文件,提取从波形中学习到的特征以及传统音频特征,分别为MFCC(梅尔倒谱系数)、GFCC(伽玛通频率倒谱系数)、频谱对比度和CQT(恒定Q变换);然后将提取到的特征分别输入到端到端的神经网络SF-CNN和多尺度卷积神经网络MS-CNN中进行识别;最后根据D-S证据理论决策规则进行决策级融合,输出最终识别结果。通过在公开数据集ESC-50进行的实验结果表明,本文提出的模型能够提高识别准确率,且优于单特征模型,更适用于复杂的声学场景。

关键词: 环境声音识别, 特征融合, 多尺度卷积运算, D-S证据理论

Abstract: Environmental sound recognition is a challenging problem due to the complex structure of environmental sounds. An environmental sound recognition method of combining feature fusion with improved convolutional neural network algorithm is proposed. Firstly, for the original audio file, the features learned from waveform and traditional audio features are extracted, which are MFCC (Mel-Frequency Cepstral Coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), spectral contrast and CQT (Constant Q-transform). Then, the extracted features are respectively input into end-to-end neural network SF-CNN and multi-scale convolution neural network MS-CNN for recognition. Finally, the decision-level fusion is carried out according to the D-S evidence theory decision rule, and the final recognition result is output. Experimental results over public dataset ESC-50 show that the proposed model can achieve higher recognition accuracy, it is superior to methods based on a single feature, and is more suitable for complex acoustic scenes.

Key words: environmental sound recognition, feature fusion, multi-scale convolution operation, D-S evidence theory