基于聚类分析与说话人识别的语音跟踪

doi:10.3969/j.issn.1006-2475.2020.04.002

计算机与现代化 ›› 2020, Vol. 0 ›› Issue (04): 7-.doi: 10.3969/j.issn.1006-2475.2020.04.002

基于聚类分析与说话人识别的语音跟踪

(广东工业大学机电工程学院,广东广州510006)

收稿日期:2019-10-22 出版日期:2020-04-22 发布日期:2020-04-24
作者简介:郝敏(1993-),男,山西太原人,硕士研究生,研究方向:智能语音处理,E-mail: 18826220184@163.com; 刘航(1994-),男,江西萍乡人,硕士,研究方向:语音信号处理,E-mail: 15521331910@163.com; 李扬(1966-),男,广东徐闻人,教授,博士,研究方向:智能装备制造,自适应控制,E-mail: lyang@gdut.edu.cn; 简单（1995-），男,硕士研究生,研究方向:电动汽车电池管理,E-mail: easy_boy@163.com; 王俊影（1990-），女,硕士研究生,研究方向:图像识别,嵌入式技术,E-mail: wangjunying_666@163.com。
基金资助:
广东省省级科技计划项目(2013B011304008,2013B090600031); 佛山市产学研专项资金项目(2012HC100195)

Speech Tracking Based on Cluster Analysis and Speaker Recognition

(School of Electro-Mechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China)

Received:2019-10-22 Online:2020-04-22 Published:2020-04-24

摘要/Abstract

摘要： 目前语音跟踪在说话人干扰的条件下，即一段语音中存在多个说话人的混合语音信号时，语音跟踪质量会严重下降。针对这种情况，提出一种基于聚类分析与说话人识别的语音跟踪算法。算法首先使用改进的聚类分析方法进行语音分离，具体包括在K-means聚类中对质心进行缓存并降低采样率，以及在embedding特征空间引入正则项。其次，算法采用GMM-UBM说话人模型进行语音跟踪。实验结果表明改进的聚类分析方法可以有效提高算法的实时性及其语音分离质量，GMM-UBM模型在3 s语音的测试中具有84%的识别率。

关键词: 单信道语音跟踪, 智能语音, 聚类分析, 高斯混合模型, 长短期记忆网络

Abstract: At present, the speech tracking quality will be seriously reduced under the condition of speaker interference, that is, mixed speech signals of multiple speakers in a speech segment. Aiming at this situation, a speech tracking algorithm based on cluster analysis and speaker recognition is proposed. Firstly, the improved clustering analysis method is used for speech separation. Specifically, it includes caching the center of mass and lowering the sampling rate in K-means clustering, and introducing regular terms into embedding feature space. Secondly, the GMM-UBM speaker model is used for speech tracking. The experimental results show that the improved cluster analysis method can effectively improve the real-time performance of the algorithm and the quality of speech separation, the GMM-UBM model has an 84% recognition rate in 3 s speech test.

Key words: single channel speech track, intelligent speech, clustering analysis, Gaussian mixture model, LSTM

中图分类号:

TP391.42

郝敏,刘航,李扬,简单,王俊影. 基于聚类分析与说话人识别的语音跟踪[J]. 计算机与现代化, 2020, 0(04): 7-.

HAO Min, LIU Hang, LI Yang, JIAN Dan, WANG Jun-ying. Speech Tracking Based on Cluster Analysis and Speaker Recognition[J]. Computer and Modernization, 2020, 0(04): 7-.

参考文献

［1］刘航. 基于LSTM与聚类分析的语音分离与跟踪算法研究［D］. 广州:广东工业大学, 2019.
［2］王方杰,金赟. 基于维纳滤波的数字助听器语音增强算法［J］. 电子器件, 2017,40(4):1021-1025.
［3］屈俊玲,李鸿燕. 基于计算听觉场景分析的混合语音信号分离算法研究［J］. 计算机应用研究, 2014,31(12):3822-3824.
［4］王义圆,张曦文,周贻能. 基于麦克风阵列的语音增强与干扰抑制算法［J］. 电声技术, 2018,396(2):4-8.
［5］ HOU J C, WANG S S, LAI Y H, et al. Audio-visual speech enhancement using multimodal deep convolutional neural networks［J］. Transactions on Emerging Topics in Computational Intelligence, 2018,2(2):117-128.
［6］ DELCROIX M, ZMOLIKOVA K, KINOSHITA K. Single channel target speaker target speaker extraction and recognition with speaker beam［C］// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. 2018:5554-5558.
［7］ DELCROIX M, KINOSHITA K, YU C. Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions［C］// 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 2016:5270-5274.
［8］郑方,李蓝天,张慧. 声纹识别技术及其应用现状［J］. 信息安全研究, 2016,2(1):44-57.
［9］张婷. 基于深度学习的有监督语音分离方法研究［D］. 济南：山东大学， 2018.
［10］董胡. 低信噪比环境下改进的语音端点检测算法［J］. 计算机技术与发展, 2016,26(3):71-74.
［11］黄建军,张雄伟,张亚非. 时频字典学习的单通道语音增强算法［J］. 声学学报, 2012,37(5):539-547.
［12］王燕南. 基于深度学习的说话人无关单通道语音分离［D］. 合肥:中国科学技术大学, 2017.
［13］VINCENT E, GRIBONVAL R, FEVOTTE C. Performance measurement in blind audio source separation［J］. IEEE Transactions on Audio, Speech, and Language Processing, 2006,14(4):1462-1469.
［14］HERSHEY J R, CHEN Z. Deep clustering: Discriminative embeddings for segmentation and separation［C］// 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. 2016:31-35.
［15］AGNEW J, THORNTON J M. Just noticeable and objectionable group delays in digital hearing aids［J］. Journal of the American Academy of Audiology, 2000,11(6):330-336.
［16］李湾湾. 说话人声纹识别的算法研究［D］. 杭州:浙江大学, 2017.
［17］丁爱明. 基于MFCC和GMM的说话人识别系统研究［D］. 南京:河海大学, 2006.
［18］周国鑫,高勇. 基于GMM-UBM模型的说话人辨识研究［J］. 无线电工程, 2014,44(12):14-17.
［19］李慧慧. 基于深度学习的短语音说话人识别研究［D］. 郑州:郑州大学, 2016.
［20］SAON G, SOLTAU H, NAHAMOO D, et al. Speaker adaptation of neural network acoustic models using i-vectors［C］// 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. 2013:55-59.
［21］杨瑞瑞. 基于文本无关的声纹识别算法的研究及实现［D］. 成都:电子科技大学, 2017.
［22］LUO Y, MESGARANI N. TasNet: Time-domain audio separation network for real-time, single-channel speech separation［C］// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. 2018:696-700.

[1]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[2]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[3]	李进1, 魏艳龙1, 薛红新2, 梁海坚2. 基于LSTM-SIR-EAKF的流感样病例预测[J]. 计算机与现代化, 2024, 0(09): 38-44.
[4]	孟雅蕾1, 师红宇1, 王予2. 一种无阻流量预测方法[J]. 计算机与现代化, 2024, 0(04): 33-37.
[5]	王秋忆, 周浩, 郑婷婷. 改进RetinaNet的电力设备目标检测方法[J]. 计算机与现代化, 2024, 0(01): 47-52.
[6]	郑立瑞, 肖晓霞, 邹北骥, 刘彬, 周展. 基于BERT的电子病历命名实体识别[J]. 计算机与现代化, 2024, 0(01): 87-91.
[7]	王宇航, 董宝良, 公超, 尚真真, 姚康宁. 基于意图识别的空中群目标动态威胁评估[J]. 计算机与现代化, 2023, 0(12): 100-104.
[8]	韩雪. 基于约束聚类和粒子群算法的多路径规划[J]. 计算机与现代化, 2023, 0(08): 7-11.
[9]	孙子雨, 任燃, 魏曦哲. 基于DTW-TCN的股票分类及预测研究[J]. 计算机与现代化, 2023, 0(08): 31-37.
[10]	张子璇, 沙秀艳, 肖霏, 粟宝婵, 隋雨陆, 孟子宸. 基于犹豫模糊Canopy-K均值聚类算法的研究与应用[J]. 计算机与现代化, 2022, 0(11): 17-21.
[11]	李春生, 冯阳宵, 富宇, 张可佳, 吴润桐. 基于均值聚类的员工行为分析方法[J]. 计算机与现代化, 2022, 0(09): 19-24.
[12]	冯申, 於跃成, 张宗海. 结合动态多类信息的兴趣点推荐[J]. 计算机与现代化, 2022, 0(08): 57-64.
[13]	李舒, 张伟业, 汪坤, 段照斌. 基于聚类分析的航班油耗组合估计[J]. 计算机与现代化, 2022, 0(08): 65-69.
[14]	张龄允, 韩莹, 张凯, 卢海鹏, 丁昱杰. 基于深度学习的短时交通流预测模型[J]. 计算机与现代化, 2022, 0(07): 54-60.
[15]	饶海兵, 朱苏磊, 杨春夏. 基于空时特征融合和注意力机制的网络入侵检测模型[J]. 计算机与现代化, 2022, 0(06): 116-121.

基于聚类分析与说话人识别的语音跟踪

Speech Tracking Based on Cluster Analysis and Speaker Recognition

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价