计算机与现代化 ›› 2024, Vol. 0 ›› Issue (09): 20-24.doi: 10.3969/j.issn.1006-2475.2024.09.004

• 人工智能 • 上一篇    下一篇

基于时频自注意力残差时序卷积网络的语音增强



 
  

  1. (国电南瑞科技股份有限公司,江苏 南京 211000)
  • 出版日期:2024-09-27 发布日期:2024-09-27
  • 基金资助:
    国电南瑞南京控制系统有限公司项目(524609230006)

Speech Enhancement Based on Time-frequency Self-attention Residual Temporal#br# Convolutional Networks

  1. (NARI Technology Development Limited Company, Nanjing 211000, China)
  • Online:2024-09-27 Published:2024-09-27

摘要: 语音增强的主要目的是去除语音信号中的噪声等无关信号,是许多语音处理任务的前端处理部分,在视频会议、视频直播等领域都有着重要的作用。然而目前大多数语音增强的研究主要集中在语音帧的长期上下文依赖关系建模上,没有考虑语音在时频域上的能量分布特征。本文提出一种基于时频域的自注意力模块,使得在模型建模过程中可以显式引入对语音分布特性的先验思考,并与残差时序卷积网络相结合,构成基于时频域自注意力的残差时序卷积网络模型。为了验证该模型的有效性,本文使用语音增强领域中常用的2个训练目标IRM和PSM进行实验,实验结果表明,该模型显著提高了语音增强领域中4种常用的客观评价指标,明显优于其他基准模型。

关键词: 语音增强, 时频域, 自注意力机制, 时序卷积网络

Abstract:  The main purpose of speech enhancement(SE) is to remove irrelevant signals such as noise. It is the front-end processing part of many speech processing tasks. SE plays an important role in fields such as video conferencing and live broadcasting. However, most studies on SE mainly focuses on the long-term context-dependent modeling of speech frames, without considering the energy distribution characteristics in the time-frequency domain. This paper proposes a self-attention module based on time-frequency domain, which makes it possible to explicitly introduce a priori thinking about speech distribution characteristics in the process of model modeling. Combined with the residual temporal convolutional network, a residual temporal convolutional network model based on time-frequency domain self-attention is constructed. In order to verify the validity of the model, two training targets, IRM and PSM, which are commonly used in the field of SE, are used for experiments. The experimental results show that the model significantly improves the performance in terms of four objective evaluation metrics in SE and is consistently better than other baseline models.

Key words:  , speech enhancement; time-frequency; self-attention mechanism; temporal convolutional network

中图分类号: