[1] WANG Z Q, WANG P D, WANG D L. Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020,28:1778-1787.
[2] QUAN C S, LI X F. Multichannel speech separation with narrow-band conformer[EB/OL]. [2022-04-09]. https://arxiv.org/pdf/2204.04464.pdf.
[3] WANG S, KONG X Y, PENG X L, et al. Dasformer: Deep alternating spectrogram transformer for multi/single-channel speech separa-tion[C]// Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023:1-5.
[4] TAHERIAN H, WANG D L. Multi-resolution location-based training for multi-channel continuous speech separation[C]// Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023:1-5.
[5] CHEN L W, YU M, SU D, et al. Multi-band pit and model integration for improved multi-channel speech separation[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019:705-709.
[6] YOSHIOKA T, ITO N, DELCROIX M, et al. The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices[C]// Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015:436-443.
[7] CHEN Z, YOSHIOKA T, XIAO X, et al. Efficient integration of fixed beamformers and speech separation networks for multi-channel far-field speech separation[C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018:5384-5388.
[8] DRUDE L, HAEB-UMBACH R. Tight integration of spatial and spectral features for BSS with deep clustering embeddings[C]// Proceedings of the 2017 Interspeech. ISCA, 2017:2650-2654.
[9] WANG Z Q, WANG P, WANG D L. Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021,29:2001-2014.
[10] GU R Z, WU J, ZHANG S X, et al. End-to-end multi-channel speech separation[EB/OL]. [2019-05-15]. https://arxiv.org/pdf/1905.06286.pdf.
[11] GU R Z, ZHANG S X, CHEN L W, et al. Enhancing end-to-end multi-channel speech separation via spatial feature learning[C]// Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020:7319-7323.
[12] LUO Y, MESGARANI N. Tasnet: Time-domain audio separation network for real-time, single-channel speech separation[C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018:696-700.
[13] LUO Y, MESGARANI N. Conv-tasnet: Surpassing ideal time-frequency magnitude masking for speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019,27(8):1256-1266.
[14] GANNOT S, VINCENT E, MARKOVICH-GOLAN S, et al. A consolidated perspective on multimicrophone speech enhancement and source separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017,25(4):692-730.
[15] HEYMANN J, DRUDE L, HAEB-UMBACH R. Neural network based spectral mask estimation for acoustic beamforming[C]// Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:196-200.
[16] ERDOGAN H, HERSHEY J R, WATANABE S, et al. Improved mvdr beamforming using single-channel mask prediction net-works[C]// Proceedings of the 2016 Interspeech. ISCA, 2016:1981-1985.
[17] OCHIAI T, DELCROIX M, IKESHITA R, et al. Beam-TasNet:Time-domain audio separation network meets frequency-domain beam-former[C]// Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020:6384-6388.
[18] CHEN H T, YI Y, FENG D, et al. Beam-guided TasNet: An iterative speech separation framework with multi-channel output[EB/OL]. [2021-02-05]. https://arxiv.org/pdf/2102.02998.pdf.
[19] SUBAKAN C, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021:21-25.
[20] VINCENT E, GRIBONVAL R, FÉVOTTE C. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006,14(4):1462-1469.
[21] LE ROUX J, WISDOM S, ERDOGAN H, et al. SDR–half-baked or well done?[C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019:626-630.
[22] TAAL C H, HENDRIKS R C, HEUSDENS R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011,19(7):2125-2136.
[23] SOUDEN M, BENESTY J, AFFES S. On optimal frequency-domain multichannel linear filtering for noise reduction [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2009,18(2):260-276.
[24] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016:770-778.
[25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Advances in Neural Information Processing Systems, ACM, 2017:6000-6010.
[26] REN S C, YANG X Y, LIU S H, et al. SG-Former: Self-guided transformer with evolving token reallocation[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. IEEE, 2023:6003-6014.
[27] WICHERN G, ANTOGNINI J, FLYNN M, et al. Wham!: Extending speech separation to noisy environments[EB/OL]. [2019-07-02]. https://arxiv.org/pdf/1907.01160.pdf.
[28] PARIENTE M, CORNELL S, COSENTINO J, et al. Asteroid: The PyTorch-based audio source separation toolkit for researchers[EB/OL]. [2020-05-08]. https://arxiv.org/pdf/2005.04132.pdf.
[29] LOIZOU P C. Speech enhancement: Theory and practice[M]. Boca Raton:CRC Press, 2007.