一种面向微控制器上环境声音分类的DNN压缩方法

doi:10.3969/j.issn.1006-2475.2024.01.013

摘要/Abstract

摘要： 摘要：环境声音分类（Environmental Sound Classification， ESC）是非语音音频分类任务最重要的课题之一。近年来，深度神经网络（Deep Neural Network， DNN）方法在ESC方面取得了许多进展。然而，DNN是计算和存储密集型的，无法直接部署到基于微控制器（Microcontroller Unit， MCU）的物联网设备上。针对这一问题，本文提出一种用于资源高度受限设备的DNN压缩方法。由于DNN模型参数规模较大无法直接部署，因此提出使用剪枝方法进行大幅压缩，并针对该操作带来的精度损失问题，设计一种基于模型中间层特征信息的知识蒸馏方法。基于STM32F746ZG设备在公开的数据集（UrbanSound8K、ESC-50）上进行测试，实验结果表明，本文方法能够获得高达97%的压缩率，同时保持良好的推理精度和速度。

关键词: 关键词：环境声音分类, 边缘计算, 微控制器, 剪枝, 知识蒸馏, 量化

Abstract: Abstract： Environmental Sound Classification （ESC） is known as one of the most important topics of the non-speech audio classification task. In recent years， deep neural networks （DNNs） have made a lot of progress in ESC. However， DNNs are computationally and memory-intensive， and cannot be directly deployed on IoT devices based on microcontroller units （MCU）. To address this problem， this paper proposes a DNN compression method for highly resource-constrained devices. Since DNNs have a large number of parameters， which cannot be directly deployed， so this paper proposes to use the pruning method for substantial compression. Afterwards， aiming at the problem of accuracy loss caused by this operation， we design a knowledge distillation based on the feature information of multiple intermediate layers. Tests are carried out on public datasets （UrbanSound8K， ESC-50） using the STM32F746ZG device. The experimental results demonstrate that proposed method can achieve up to 97% compression rate while maintaining good inference performance and speed.

Key words: Key words： environmental sound classification, edge computing, microcontroller unit, pruning, knowledge distillation, quantization

中图分类号:

TP81

孟娜, 方维维, 路红英. 一种面向微控制器上环境声音分类的DNN压缩方法[J]. 计算机与现代化, 2024, 0(01): 80-86.

MENG Na, FANG Wei-wei, LU Hong-ying. A DNN Compression Method for Environmental Sound Classification on Microcontroller Unit[J]. Computer and Modernization, 2024, 0(01): 80-86.

参考文献

［1］ NANNI L， MAGUOLO G， BRAHNAM S， et al. An ensemble of convolutional neural networks for audio classification［J］. Applied Sciences， 2021，11（13）. DOI： 10.3390/
app11135796.
［2］ DEMIR F， TURKOGLU M， ASLAN M， et al. A new pyramidal concatenated CNN approach for environmental sound classification［J］. Applied Acoustics， 2020，170. DOI：10.
1016/j.apacoust.2020.107520.
［3］ DAVIS N， SURESH K. Environmental sound classification using deep convolutional neural networks and data augmentation［C］// 2018 IEEE Recent Advances in Intelligent Computational Systems （RAICS）. 2018：41-45.
［4］ NORDBY J. Environmental sound classification on microcontrollers using convolutional neural networks［D］. Norwegian University of Life Sciences， 2019.
［5］ SZE V， CHEN Y H， YANG T J， et al. Efficient processing of deep neural networks： A tutorial and survey［J］. Proceedings of the IEEE， 2017，105（12）：2295-2329.
［6］ SHARMA J， GRANMO O C， GOODWIN M. Environment sound classification using multiple feature channels and attention based deep convolutional neural network［C］// Interspeech 2020. 2020：1186-1190.
［7］ ABDOLI S， CARDINAL P， KOERICH A L. End-to-end environmental sound classification using a 1D convolutional neural network［J］. Expert Systems with Applications， 2019，136：252-263.
［8］ PALANISAMY K， SINGHANIA D， YAO A. Rethinking CNN models for audio classification［J］. arXiv preprint arXiv：2007.11154， 2020.
［9］ LIN J， CHEN W M， LIN Y J， et al. Mcunet： Tiny deep learning on IoT devices［J］. Advances in Neural Information Processing Systems， 2020，33：11711-11722.
［10］ DOYU H， MORABITO R， HOLLER J. Bringing machine learning to the deepest IoT edge with TinyML as-a-service［J］. IEEE IoT Newsletter， 2020，11：1-3.
［11］ MOHAIMENUZZAMAN M， BERGMEIR C， WEST I， et al. Environmental sound classification on the edge： A pipeline for deep acoustic networks on extremely resource-constrained devices［J］. Pattern Recognition， 2023，133. DOI： 10.1016/j.patcog.2022.109025.
［12］ KUMARI S， ROY D， CARTWRIGHT M， et al. EdgeL3： Compressing L3-Net for mote scale urban noise monitoring［C］// 2019 IEEE International Parallel and Distributed Processing Symposium Workshops （IPDPSW）. 2019：877-884.
［13］ CERUTTI G， PRASAD R， BRUTTI A， et al. Compact recurrent neural networks for acoustic event detection on low-energy low-complexity platforms［J］. IEEE Journal of Selected Topics in Signal Processing， 2020，14（4）：654-664.
［14］ SALAMON J， JACOBY C， BELLO J P. A dataset and taxonomy for urban sound research［C］// Proceedings of the 22nd ACM International Conference on Multimedia. 2014：1041-1044.
［15］ PICZAK K J. ESC： Dataset for environmental sound classification［C］// Proceedings of the 23rd ACM International Conference on Multimedia. 2015：1015-1018.
［16］ CHENG J， WANG P S， LI G， et al. Recent advances in efficient computation of deep convolutional neural networks［J］. Frontiers of Information Technology & Electronic Engineering， 2018，19（1）：64-77.
［17］ LOUIZOS C， WELLING M， KINGMA D P. Learning sparse neural networks through L_0 regularization［C］// International Conference on Learning Representations. 2018. DOI： 10.48550/arXiv.1712.01312.
［18］ LIU N， MA X L， XU Z Y， et al. Autocompress： An automatic DNN structured pruning framework for ultra-high compression rates［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020，34（4）：4876-4883.
［19］ LIU M R， FANG W W， MA X D， et al. Channel pruning guided by spatial and channel attention for DNNs in intelligent edge computing［J］. Applied Soft Computing， 2021，110. DOI： 10.1016/j.asoc.2021.107636.
［20］ DUGGAL R， XIAO C， VUDUC R， et al. Cup： Cluster pruning for compressing deep neural networks［C］// 2021 IEEE International Conference on Big Data （Big Data）. 2021：5102-5106.
［21］ HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［J］. arXiv preprint arXiv：1503.02531， 2015.
［22］ ROMERO A， BALLAS N， KAHOU S E， et al. Fitnets： Hints for thin deep nets［J］. arXiv preprint arXiv：1412.6550， 2015.
［23］ ZAGORUYKO S， KOMODAKIS N. Paying more attention to attention：Improving the performance of convolutional neural networks via attention transfer［J］. arXiv preprint arXiv：1612.03928， 2017.
［24］ WANG H， LOHIT S， JONES M， et al. Multi-head knowledge distillation for model compression［J］. arXiv preprint arXiv：2012.02911， 2020.
［25］ YANG C G， AN Z L， CAI L H， et al. Hierarchical self-supervised augmented knowledge distillation［C］// Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021：1217-1223.
［26］ CHEN D F， MEI J P， ZHANG Y， et al. Cross-layer distillation with semantic calibration［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2021，35（8）：7028-7036.
［27］ PASSBAN P， WU Y M， REZAGHOLIZADEH M， et al. ALP-KD： Attention-based layer projection for knowledge distillation［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2021，35（15）：13657-13665.
［28］ WU Y， REZAGHOLIZADEH M， GHADDAR A， et al. Universal-KD： Attention-based output-grounded intermediate layer knowledge distillation［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021：7649-7661.
［29］ GHOLAMI A， KIM S， DONG Z， et al. A survey of quantization methods for efficient neural network inference［M］// Low-Power Computer Vision. Chapman and Hall/CRC. 2022：291-326.
［30］ RASTEGARI M， ORDONEZ V， REDMON J， et al. Xnor-net： Imagenet classification using binary convolutional neural networks［C］// European Conference on Computer Vision. 2016：525-542.
［31］ COURBARIAUX M， BENGIO Y， DAVID J P. Binaryconnect： Training deep neural networks with binary weights during propagations［J］. Advances in Neural Information Processing Systems， 2015，28：12-19.
［32］ CHOI J， VENKATARAMANI S， SRINIVASAN V V， et al. Accurate and efficient 2-bit quantized neural networks［M］// Proceedings of Machine Learning and Systems. 2019：348-359.
［33］ CHOUKROUN Y， KRAVCHIK E， YANG F， et al. Low-bit quantization of neural networks for efficient inference［C］// 2019 IEEE/CVF International Conference on Computer Vision Workshop （ICCVW）. 2019：3009-3018.
［34］ BANNER R， NAHSHAN Y， SOUDRY D. Post training 4-bit quantization of convolutional networks for rapid-deployment［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019：7950-7958.
［35］ JACOB B， KLIGYS S， CHEN B， et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018：2704-2713.
［36］ LIANG T L， GLOSSNER J， WANG L， et al. Pruning and quantization for deep neural network acceleration： A survey［J］. Neurocomputing， 2021，461：370-403.
［37］ DAVID R， DUKE J， JAIN A， et al. Tensorflow lite micro： Embedded machine learning for TinyML systems［J］. Proceedings of Machine Learning and Systems. 2021，3：800-811.

[1]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[2]	史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99.
[3]	岳有军1, 2, 张远锟1, 赵辉1, 2, 王红君1, 2. 基于多尺度特征与注意力模块的室内场景识别方法[J]. 计算机与现代化, 2024, 0(08): 37-42.
[4]	李爽1, 2, 叶宁1, 2, 徐康1, 2, 王甦1, 王汝传1, 2. 面向智慧养老的边缘计算卸载方法[J]. 计算机与现代化, 2024, 0(06): 95-102.
[5]	赵晨伊1, 赵欣2. 一种基于元宇宙的边缘端资源配置方案[J]. 计算机与现代化, 2024, 0(04): 121-126.
[6]	周永钦, 王勇, 王瑛. 基于多尺度特征及注意力机制的轻量化PCB缺陷检测方法#br#[J]. 计算机与现代化, 2024, 0(02): 88-92.
[7]	陈琦, 李晶晶. D2D网络中基于多目标优化的计算卸载策略[J]. 计算机与现代化, 2024, 0(01): 21-28.
[8]	闫阳, 詹子俊, 曹绍华. 基于设备协同的大规模卸载：融合分治和贪心的双层优化算法[J]. 计算机与现代化, 2023, 0(11): 13-21.
[9]	杨波. 基于边缘计算的多用户动态带宽分配方法[J]. 计算机与现代化, 2023, 0(11): 69-74.
[10]	罗明杰, 冯开平. 基于沙漏结构与注意力机制的轻量级人脸表情识别方法[J]. 计算机与现代化, 2023, 0(11): 89-94.
[11]	欧嘉城, 曾安, 金亮. 基于CP-YOLOX的冷冻电镜图像蛋白质目标检测算法[J]. 计算机与现代化, 2023, 0(11): 113-119.
[12]	李延满, 王必恒, 赵羚焱. 基于轻量化YOLOv5的安全帽检测[J]. 计算机与现代化, 2023, 0(10): 59-64.
[13]	何玉鹏, 陶勇, 王必恒, 赵英男. 智能配电网边缘计算研究现状与展望[J]. 计算机与现代化, 2023, 0(08): 87-92.
[14]	陈刚, 王志坚, 徐胜超. 基于可行点追踪-连续凸逼近的移动边缘计算任务卸载[J]. 计算机与现代化, 2023, 0(08): 93-97.
[15]	山雨, 张好鹏, 池静. 基于改进YOLOv4的轻量化车牌检测算法[J]. 计算机与现代化, 2023, 0(07): 99-104.