基于BIC的语音识别模型压缩算法

计算机与现代化 ›› 2014, Vol. 0 ›› Issue (6): 71-73,78.

基于BIC的语音识别模型压缩算法

东华大学计算机学院,上海201620

收稿日期:2014-02-28 出版日期:2014-06-13 发布日期:2014-06-25
作者简介: 邹灿(1988-),男,湖南衡阳人,东华大学计算机学院硕士研究生,研究方向:计算机体系结构；李柏岩(1965-),男,副教授,硕士,研究方向:计算机图形图像。

Speech Recognition Model Compression Algorithm Based on Bayesian Information Criterion

College of Computer Science and Technology, Donghua University, Shanghai 201620, China

Received:2014-02-28 Online:2014-06-13 Published:2014-06-25

摘要/Abstract

摘要：

当对HMM（Hidden Markov Model，隐马尔科夫模型）语音模型进行GMM（Gaussian Mixture Model，混合高斯模型）区分训练增加组件时，语音模型的识别率会随着GMM的组件增多而增加，

模型的大小也会增加，这就造成了语音模型的臃肿。而在移动端使用本地语音模型进行识别时，存放一个几百兆的模型很不合适。针对上述问题，本文提出将一个GMM组件数较多的语音模型利用BIC准则

压缩到指定的组件数，从而在模型大小合适的情况下尽量保证模型的识别率。实验结果表明，使用本方法进行压缩之后的语音识别率比未压缩的相同组件数的语音识别模型的识别率要高。

关键词: 语音识别, 模型压缩, BIC（贝叶斯信息准则）

Abstract:

Recognition rate of speech model will increase with the increase in the number of GMM components, the size of model will increase as well, when making the GMM

recognition training for HMM speech model, and it causes model bloated. However, it is unfit for mobile devices while using speech model for recognition to keep greater than

hundreds of megabytes in mobile. For this problem, a method for compress speech model based on BIC is presented. This method tries to keep recognition rate of speech model in

appropriate to the size of model. Experiments demonstrate that it’s applicable and available to achieve the final speech model specified size even ensure recognition rate of

speech model as much as possible.

Key words: speech recognition, model compress, BIC (bayesian information criterion)

邹灿,李柏岩. 基于BIC的语音识别模型压缩算法[J]. 计算机与现代化, 2014, 0(6): 71-73,78.

ZOU Can, LI Bai-yan. Speech Recognition Model Compression Algorithm Based on Bayesian Information Criterion[J]. Computer and Modernization, 2014, 0(6): 71-73,78.

参考文献

［1］
Jurafsky D, Martin.Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition(2nd ed)［M］. Prentice Hall,

2008.

［2］ Juang B H, Rabiner L R. Hidden Markov models for speech recognition［J］. Technometrics, 1991,33(3):251-272.
［3］ Xie Chen, Adam Eversole, Gang Li,et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks［DB/OL］. http://research.microsoft.com/apps/pubs/?

id=173312, 2012-09-10.
［4］ Gideon Schwarz. Estimating the dimension of a model［J］. The Annals of Statistics, 1978,6(2):461-464.
［5］ Akaike H. A new look at the statistical identication model［J］. IEEE Transactions on Automatic Control, 1974,19(6):716-723.
［6］ Jin H, Kubala F, Schwartz R. Automatic speaker clustering［C］// Proceedings of the 1997 DARPA Speech Recognition Workshop. 1997:108-111.
［7］ Legetter C J, Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models［J］. Computer Speech and Language,

1995,9(2):171-185.
［8］ Geoffrey J McLachlan, Thriyambakam Krishnan. The EM Algorithm and Extensions(2nd ed)［M］. Wiley, 2008.
［9］ Lawrence Rabiner, Biing-Hwang Juang. Fundamentals of Speech Recognition［M］. USA: Prentice Hall, 1993.
［10］Akaike H. A new look at the statistical identification［J］. IEEE Transactions on Automatic Control, 1974,19(6):716-723.
［11］Schwarz G. A second-order approximation to optimal sampling regions［J］. The Annals of Mathematical Statistics, 1969,40(1):313-315.
［12］Schwarz G. A sequential student test［J］. The Annals of Mathematical Statistics, 1971,42(3):1003-1009.
［13］Tong H. Determination of the order of a Markov chain by Akaike’s information criterion［J］. Journal of Applied Probability, 1975,12(3):488-497.
［14］吴华,徐波,黄泰翼. 基于三音子模型的语料自动选择算法［J］. 软件学报, 2000,11(2):271-276.
［15］韩兆兵,贾磊,张树武,等. 连续语音识别中声学建模的组合聚类算法研究［J］. 中文信息学报, 2003,17(4):33-38.
［16］郝杰,李星. 汉语连续语音识别中经典HMM的实验评测［J］. 计算机工程与应用, 2001，37(13):1-4,101.

[1]	何若男1, 范翔2, 陈益1, 姜羽菲1, 曹辉1. 比例优势逻辑回归优化嗓音障碍指数算法[J]. 计算机与现代化, 2024, 0(08): 1-4.
[2]	黄延辉, 兰海, 魏宪. 基于可分离结构变换的轻量级Vision Transformer[J]. 计算机与现代化, 2022, 0(10): 75-81.
[3]	许鸿奎, 张子枫, 卢江坤, 周俊杰, 胡文烨, 姜彤彤. 混合CTC/Attention模型在普通话识别中的应用[J]. 计算机与现代化, 2022, 0(08): 1-6.
[4]	包志强, 程萍, 黄琼丹, 吕少卿. 一种卷积神经网络的模型压缩算法[J]. 计算机与现代化, 2021, 0(10): 107-111.
[5]	张允耀, 黄鹤鸣, 张会云, . 复杂噪声环境下语音识别研究[J]. 计算机与现代化, 2021, 0(09): 68-74.
[6]	白士磊, 殷柯欣, 朱建启. 轻量级YOLOv3的交通标志检测算法[J]. 计算机与现代化, 2020, 0(09): 83-88.
[7]	更藏措毛1,2,黄鹤鸣1,2. 双向循环神经网络在语音识别中的应用[J]. 计算机与现代化, 2019, 0(10): 1-.
[8]	廖小东，贾晓霞 . 基于改进型C3D神经网络的动作识别技术[J]. 计算机与现代化, 2019, 0(03): 32-.
[9]	缑新科1，2，3,徐高鹏1，2，3. 基于Gabor滤波的语音识别鲁棒性研究[J]. 计算机与现代化, 2018, 0(05): 20-.
[10]	张婧婧;李勇伟;达新民. 基于16位单片机语音识别技术的应用与研究[J]. 计算机与现代化, 2012, 203(7): 176-178.
[11]	靳月英. 语音识别ASIC中端点检测算法研究与实现[J]. 计算机与现代化, 2011, 12(12): 57-59，7.
[12]	马红星王海坤刘聪. 命令词语音识别系统的置信度改进方法 [J]. 计算机与现代化, 2011, 194(10): 146-149.
[13]	严乐贫;奉小慧. 双模态车载语音控制仿真系统的设计与实现[J]. 计算机与现代化, 2010, 1(8): 211-215.
[14]	杨靖;刘晓刚;尹小静. 基于普适计算的交互汉语学习系统[J]. 计算机与现代化, 2010, 1(5): 130-134.
[15]	柏懋睿;郑郁正;张杰. 基于新型小波滤波器的语音识别特征提取方法[J]. 计算机与现代化, 2010, 1(3): 111-4.