基于CycleGAN的非平行语音去噪方法

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (02): 73-77.

基于CycleGAN的非平行语音去噪方法

（河海大学能源与电气学院，江苏南京211100）

出版日期:2021-03-01 发布日期:2021-03-01
作者简介:韩灿灿（1995—），女，安徽安庆人，硕士研究生，研究方向：机器学习与音频处理，E-mail: emma_han11@foxmail.com；李志华(1964—),男,江苏泰州人,教授,硕士生导师,博士,研究方向:人工智能与复杂系统故障诊断；徐睿(1995—),女,江苏盐城人，硕士研究生,研究方向:机器学习与音频处理。
基金资助:
江苏省自然科学基金资助项目(BK20151500)

Method of Nonparallel Speech Denoising Based on CycleGAN

(College of Energy and Electrical Engineering, Hohai University, Nanjing 211100, China)

Online:2021-03-01 Published:2021-03-01

摘要/Abstract

摘要： 针对语音去噪问题，提出一种基于循环生成对抗网络（CycleGAN）的方法来对声音场景中的语音进行去噪。该方法把CycleGAN的网络模型与不同领域间的语音转换技术进行结合与优化，通过提取语音频谱包络特征，对语音进行编码与解码的处理，旨在用先进的生成技术实现语音端到端的去噪，从而简化语音去噪过程中带来的高阶差异问题，同时泛化其应用场景。通过对非平行数据集和平行数据集进行训练与测试，主要比较该方法与传统CycleGAN的语音去噪方法下的去噪效果，由实验结果得到PESQ、NR、SSNR这3项指标分别相对提高了8.49%、6.53%、23.30%，有效地解决了实际场景中的非平行语音去噪问题。

关键词: 语音去噪, 循环生成对抗网络, 语音转换, 非平行数据集

Abstract: To solve the problem of speech denoising, a method based on cyclic generation adversarial network (CycleGAN) is proposed. This method combines and optimizes the network model of CycleGAN with the voice conversion technology in different fields, extracts the spectrum envelope features of speech, and then encodes and decodes the speech, aiming to achieve the end-to-end denoising of speech with advanced generation technology. Thus, the proposed algorithm simplifies the high-order difference problem in the process of speech denoising, and generalizes its application scenarios. By training and testing the nonparallel data set and parallel data set, the denoising effect of this method is mainly compared with that of the traditional CycleGAN method. The experimental results show that PESQ, NR and SSNR are improved by 8.49%, 6.53% and 23.30% respectively, which effectively solves the problem of nonparallel speech denoising in the actual scene.

Key words: speech denoising, CycleGAN, voice conversion, nonparallel data set

韩灿灿, 李志华, 徐睿. 基于CycleGAN的非平行语音去噪方法[J]. 计算机与现代化, 2021, 0(02): 73-77.

HAN Can-can, LI Zhi-hua, XU Rui. Method of Nonparallel Speech Denoising Based on CycleGAN[J]. Computer and Modernization, 2021, 0(02): 73-77.

参考文献

［1］张亮,龚卫国. 一种改进的维纳滤波语音增强算法［J］. 计算机工程与应用, 2010,46(26):129-131.
［2］程履帮. OFDMA系统中基于LMMSE信道估计算法的改进及其性能分析［J］. 电子学报, 2008,36(9):1782-1785.
［3］赵胜跃,戴蓓蒨. 基于最小统计噪声估计的信号子空间语音增强［J］. 数据采集与处理, 2007,22(4):453-457.
［4］ LU X G, UNOKI M, MATSUDA S, et al. Controlling tradeoff between approximation accuracy and complexity of a smooth function in a reproducing Kernel Hilbert Space for noise reduction［J］. IEEE Transactions on Signal Processing, 2013,61(3):601-610.
［5］ HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks［J］. Science, 2006,313(5786):504-507.
［6］ GRAVES A, MOHAMED A R, HINTON G. Speech recognition with deep recurrent neural networks［C］// 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013:6645-6649.
［7］李彦冬,郝宗波,雷航. 卷积神经网络研究综述［J］. 计算机应用, 2016,36(9):2508-2515.
［8］ GOODFELLOW I J, POUGET-ABADIE J, MIRZA M. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014:2672-2680.
［9］ PASCUAL S, BONAFONTE A, SERRA J. SEGAN: Speech enhancement generative adversarial network［C］// Interspeech 2017. 2017:3642-3646.
［10］MICHELSANT I D, TAN Z. Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification［C］// Interspeech 2017. 2017:2008-2012.
［11］王怡斐,韩俊刚,樊良辉. 基于WGAN的语音增强算法研究［J］. 重庆邮电大学学报(自然科学版), 2019,31(1):136-142.
［12］PASCUAL S, SERRA J, BONAFONTE A. Time-domain speech enhancement using generative adversarial networks［J］. Speech Communication, 2019,114:10-21.
［13］李涛. 基于CycleGAN网络实现非平行语料库条件下的语音转换［D］. 大连:大连理工大学, 2018.
［14］KANEKO T, KAMEOKA H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks［C］// 2018 26th European Signal Processing Conference(EUSIPCO). 2018:2100-2104.
［15］ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks［C］// 2017 IEEE International Conference on Computer Vision. 2017:2242-2251.
［16］KIM T, CHA M, KIM H, et al. Learning to discover cross-domain relations with generative adversarial networks［C］// Proceedings of the 34th International Conference on Machine Learning. 2017:1857-1865.
［17］YI Z L, ZHANG H, TAN P, et al. DualGAN: Unsupervised dual learning for image-to-image translation［C］// 2017 IEEE International Conference on Computer Vision. 2017:2868-2876.
［18］ZHOUT H, KRAHENBUHL P, AUBRY M, et al. Learning dense correspondence via 3D-guided cycle consistency［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:117-126.
［19］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［20］ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:5967-5976.
［21］CHOI Y, CHOI M, KIM M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8789-8797.
［22］ZHANG C L, LUO J H, WEI X S, et al. In defense of fully connected layers in visual representation transfer［C］// Pacific Rim Conference on Multimedia. 2017:807-817.
［23］GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs［C］// Advances in Neural Information Processing Systems. 2017:5769-5779.
［24］VALENTINI-BOTINHAO C, WANG X, TAKAKI S, et al. Investigating RNN-based speech enhancement methods for noise-robust text-to-speech［C］// The 9th ISCA Speech Synthesis Workshop. 2016:146-152.