计算机与现代化 ›› 2023, Vol. 0 ›› Issue (12): 48-52.doi: 10.3969/j.issn.1006-2475.2023.12.009

• 算法设计与分析 • 上一篇    下一篇

基于模型种群分析变量选择的红外光谱建模方法

  

  1. (青岛科技大学自动化与电子工程学院,山东 青岛 266061)
  • 出版日期:2023-12-24 发布日期:2024-01-24
  • 作者简介:杜康(1997—),男,山东济南人,硕士研究生,研究方向:工业数据分析与挖掘,E-mail: dukang132@foxmail.com; 郭鲁钰(1996—),男,山东淄博人,硕士研究生,研究方向:工业过程数据建模与分析,E-mail: 2519817820@qq.com; 徐啟蕾(1980—),女,山东青岛人,副教授,博士,研究方向:信息感知与智能处理,E-mail: 1255707511@qq.com; 单宝明(1974—),男,山东东营人,教授,博士,研究方向:复杂工业过程的控制优化,E-mail: shan.bm@hotmail.com; 张方坤(1986—),男,山东聊城人,副教授,博士,研究方向:复杂系统建模与优化,E-mail: f.k.zhang@qust.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(62103216); 山东省自然科学基金资助项目(ZR2020QF060)

Infrared Spectrum Modeling Method Based on Variable Selection of Model#br# Population Analysis#br#

  1. (College of Automation and Electronic Engineering, Qingdao University of Science and Technology, Qingdao 266061, China)
  • Online:2023-12-24 Published:2024-01-24

摘要: 摘要:变量选择方法可以实现对高维数据的降维,降低标定模型的复杂度以及提高模型的预测能力和可解释性,对建立高效可靠的预测模型具有重要意义。本文将模型种群分析(Model Population Analysis, MPA)用于近红外光谱标定建模过程的变量选择,结合MPA在同一空间反复抽取子集的特点,提出一种子集索引重用核-偏最小二乘(Subset Index Reuse Kernel-Partial Least Squares, SIRK-PLS)融合建模方法。该方法通过对预先计算的协方差矩阵进行索引,从本质上避免MPA框架下变量选择子集交叉验证和回归系数求解过程中的冗余计算,提高建模效率。此外,SIRK-PLS建模方法可以根据样本数和变量数的比例,实现建模算法的自动最优切换。通过标称近红外光谱玉米数据集对算法性能进行验证。结果表明,本文提出的SIRK-PLS建模方法收敛速度快、精度高,适用于移动红外光谱设备的自动快速降维建模,具有一定的应用前景。

关键词: 关键词:偏最小二乘, 模型种群分析, 红外光谱技术, 变量选择, 子空间建模

Abstract: Abstract: The variable selection method can realize the dimensionality reduction of high-dimensional data, reduce the complexity of the calibration model as well as improve the predictive ability and interpretability of the model, which is important for establishing an efficient and reliable prediction model. In this paper, model population analysis (MPA) is used for variable selection in the modeling process of NIR spectral calibration. A subset index reuse kernel - partial least squares (SIRK-PLS) fusion modeling approach is proposed by combining the characteristics of MPA to repeatedly extract subsets in the same space. The method essentially avoids redundant calculations in the process of cross-validation of variable selection subsets and regression coefficient solving under the MPA framework by indexing the pre-calculated covariance matrix, and improves modeling efficiency. In addition, the SIRK-PLS modeling approach allows for automatic optimal switching of modeling algorithms based on the ratio of the number of samples to the number of variables. The algorithm performance is validated with a nominal near-infrared spectral corn data set. The results show that the SIRK-PLS modeling method proposed in this paper has fast convergence speed and high accuracy, and is suitable for automatic and fast dimensionality reduction modeling of mobile infrared spectroscopy devices, which has some application prospects.

Key words: Key words: partial least squares, model population analysis, infrared spectroscopy technique, variable selection, subspace modelling

中图分类号: