Computer and Modernization ›› 2024, Vol. 0 ›› Issue (09): 20-24.doi: 10.3969/j.issn.1006-2475.2024.09.004

Previous Articles     Next Articles

Speech Enhancement Based on Time-frequency Self-attention Residual Temporal#br# Convolutional Networks

  

  1. (NARI Technology Development Limited Company, Nanjing 211000, China)
  • Online:2024-09-27 Published:2024-09-27

Abstract:  The main purpose of speech enhancement(SE) is to remove irrelevant signals such as noise. It is the front-end processing part of many speech processing tasks. SE plays an important role in fields such as video conferencing and live broadcasting. However, most studies on SE mainly focuses on the long-term context-dependent modeling of speech frames, without considering the energy distribution characteristics in the time-frequency domain. This paper proposes a self-attention module based on time-frequency domain, which makes it possible to explicitly introduce a priori thinking about speech distribution characteristics in the process of model modeling. Combined with the residual temporal convolutional network, a residual temporal convolutional network model based on time-frequency domain self-attention is constructed. In order to verify the validity of the model, two training targets, IRM and PSM, which are commonly used in the field of SE, are used for experiments. The experimental results show that the model significantly improves the performance in terms of four objective evaluation metrics in SE and is consistently better than other baseline models.

Key words:  , speech enhancement; time-frequency; self-attention mechanism; temporal convolutional network

CLC Number: