计算机与现代化 ›› 2022, Vol. 0 ›› Issue (11): 119-126.

• 网络与通信 • 上一篇    

基于TasNet的单通道语音分离技术的研究综述

  

  1. (华南师范大学计算机学院,广东广州510631)
  • 出版日期:2022-11-30 发布日期:2022-11-30
  • 作者简介:陆炜(1996—),男,安徽枞阳人,硕士研究生,研究方向:大数据,人工智能,E-mail: Lu_reed0303@163.com; 通信作者:朱定局(1978—),男,教授,博士,研究方向:大数据,人工智能,E-mail: zhudingju@m.scnu.edu.cn。
  • 基金资助:
    国家自然科学基金重点项目资助(U18112000)

Research Review of Single-channel Speech Separation Technology Based on TasNet

  1. (School of Computer Science, South China Normal University, Guangzhou 510631, China)
  • Online:2022-11-30 Published:2022-11-30

摘要: 语音分离是声学信号处理中的一项基本任务,具有广泛的应用。得益于深度学习的发展,近年来单通道语音分离系统的性能有了显着提升。特别是,随着一种被称为时域音频网络(Time-domain audio separation Network,TasNet)的新语音分离方法被提出,语音分离技术的研究也逐步从基于时-频域的传统方法过渡至基于时域的方法。本文综述基于TasNet的单通道语音分离技术的研究现状与展望。在回顾基于时-频域的语音分离传统方法之后,本文重点介绍基于TasNet的Conv-TasNet模型以及DPRNN模型,并对比针对各模型的改进研究。最后,本文阐述目前基于TasNet的单通道语音分离模型的局限性,并从模型、数据集、说话人数量以及如何解决复杂场景下的语音分离等层面对未来的研究方向进行讨论。

关键词: 语音分离, 时域音频网络, 全卷积时域音频网络, 双路径循环神经网络

Abstract: Speech separation is a fundamental task in acoustic signal processing with a wide range of applications. Thanks to the development of deep learning, the performance of single-channel speech separation systems has been significantly improved in recent years. In particular, with the introduction of a new speech separation method called time-domain audio separation network (TasNet), speech separation technology is also gradually transitioning from the traditional method based on time-frequency domain to the one based on time domain methods. This paper reviews the research status and prospect of single-channel speech separation technology based on TasNet. After reviewing the traditional methods of speech separation based on time-frequency domain, this paper focuses on the TasNet-based Conv-TasNet model and DPRNN model, and compares the improvement research on each model. Finally, this paper expounds the limitations of the current single-channel speech separation model based on TasNet, and discusses future research directions from the aspects of model, dataset, number of speakers, and how to solve speech separation in complex scenarios.

Key words: speech separation, TasNet, Conv-TasNet, DPRNN