计算机与现代化 ›› 2023, Vol. 0 ›› Issue (11): 36-43.doi: 10.3969/j.issn.1006-2475.2023.11.006

• 算法设计与分析 • 上一篇    下一篇

基于RF-RFECV和LightGBM算法的糖尿病预测

  

  1. (空军军医大学基础医学院,陕西 西安  710032)
  • 出版日期:2023-11-29 发布日期:2023-11-29
  • 作者简介:刘静乐(1987—),女,河南洛阳人,助教,硕士,研究方向:医学数据分析,E-mail: 979631849@qq.com; 罗翔(1983—),男,陕西西安人,讲师, 硕士,研究方向:计算机应用研究,E-mail: asn365@126.com; 宫成荣(1993—),男,陕西西安人,助教, 硕士,研究方向:医学图像处理,E-mail: gongchengrong@126.com; 通信作者: 张国鹏(1975—),男,陕西咸阳人,副教授,博士,研究方向:计算机应用研究,E-mail: zhanggp@fmmu.edu.cn。

Prediction of Diabetes Mellitus Using LightGBM Classifier with RF-RFECV

  1. (Basic Medical Science Academy, Air Force Military Medical University, Xi’an 710032, China)
  • Online:2023-11-29 Published:2023-11-29

摘要: 摘要:为了及早发现中国患糖尿病的高危人群并提供有针对性的干预措施,选取代表中国人群的中国健康与养老追踪调查(CHARLS)数据集作为研究对象,提出基于随机森林-交叉验证递归特征消除法(RF-RFECV)和LightGBM的混合算法(RF-RFECV-LightGBM),并与其他5种算法进行实验对比。结果表明RF-RFECV-LightGBM整体性能最优,准确率、精度、召回率、F1值、AUC值分别为0.9772、0.9952、0.8178、0.8978、0.9357。预测时间为0.0428 s,较特征选择前LightGBM的预测时间缩短0.0549 s(提升56.19%),表明了RF-RFECV算法特征选择的有效性。最后,同样的预测流程在皮马印地安人数据集上进行实验,结果达到0.9415的准确率,进一步验证了所提算法的优异性能,可以辅助临床糖尿病诊疗。

关键词: 关键词:轻量级梯度提升树, 随机森林-交叉验证递归特征消除算法, 糖尿病预测, CHARLS数据集, Pima数据集

Abstract: Abstract: In order to find the high-risk population of diabetes in China as early as possible and provide targeted intervention measures, the data set of China Health and Retirement Longitudinal Study (CHARLS), which represents the Chinese population, was selected as the research object, and a hybrid algorithm based on RF-RFECV and LightGBM (RF-RFECV-LightGBM) was proposed, and compared with five other algorithms through experiments. The results show that RF-RFECV- LightGBM has the best overall performance, the accuracy, precision, recall, F1 value and AUC value are 0.9772, 0.9952, 0.8178, 0.8978, and 0.9357, respectively. The prediction time is 0.0428 s, which is 0.0549 s shorter than the prediction time of LightGBM before feature selection (increased by 56.19%), indicating the effectiveness of RF-RFECV algorithm. Finally, the same prediction process was tested on the Pima Indian dataset, and the results achieved an accuracy of 0.9415, further verifying the excellent performance of the proposed algorithm RF-RFECV-LightGBM, which can assist in clinical diagnosis and treatment of diabetes.

Key words: Key words: LightGBM, RF-RFECV, prediction of diabetes, CHARLS, Pima

中图分类号: