计算机与现代化 ›› 2025, Vol. 0 ›› Issue (09): 119-126.doi: 10.3969/j.issn.1006-2475.2025.09.017

• 数据库与数据挖掘 • 上一篇    

结合可解释分析的多模型堆叠乳腺癌住院费用预测

  


  1. (1.杭州师范大学信息科学与技术学院,浙江 杭州 311121; 2.杭州市卫生健康事业发展中心,浙江 杭州 310006) 
  • 出版日期:2025-09-24 发布日期:2025-09-24
  • 作者简介: 作者简介:朱海玉(1996—),女,河南商丘人,硕士研究生,研究方向:机器学习与医学数据分析,E-mail: hikzy_zhu@163.com; 孙晓燕(1980 —),女,浙江杭州人,讲师,研究方向:医疗图像增强现实和可视化,E-mail: sunxy@hznu.edu.cn; 袁贞明(1972—),男,浙江杭州人,教授,研究方向:人工智能和数据挖掘,E-mail: zmyuan@hznu.edu.cn; 通信作者:杨丽静(1982—),女,浙江杭州人,副研究员,E-mail: yanglijing3799566@126.com。
  • 基金资助:
        基金项目:杭州市卫生科技计划重点基金资助项目(ZD20230018)

Prediction of Breast Cancer Hospitalization Costs Based on Stacking Ensemble and Explainable Models


  1. (1. School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China; 
    2. Hangzhou Health Development Center, Hangzhou 310006, China)
  • Online:2025-09-24 Published:2025-09-24

摘要:
摘要:住院费用是影响乳腺癌患者治疗选择和预后的因素之一,精准预测住院费用及个性化分析费用影响因素对于有效配置资源和优化医疗服务至关重要。针对单一模型在住院费用预测任务上存在泛化能力弱且可解释差的问题,本文提出一种可解释的堆叠方法,充分整合多种模型的特征提取能力,实现对乳腺癌患者住院费用的准确预测。该方法采用2层模型融合结构,第1层选择4个基模型,并利用贝叶斯优化和五折交叉验证技术优化参数,提高每个模型的预测性能,再由第2层线性回归输出最终的费用。此外,本文还使用SHAP和LIME方法从整体和个体角度分析乳腺癌住院费用预测结果。在某医院乳腺癌住院患者数据集上的实验结果表明,堆叠方法在费用预测任务中的R2为0.877,优于其他相关研究。可解释性分析结果表明,住院时长和治疗方式是影响住院费用的主要因素,但不同患者的影响因素存在个体化差异,这为更深入了解影响住院费用的关键因素提供了宝贵的见解。


关键词: 关键词:住院费用预测, 可解释的堆叠方法, SHAP, LIME, 机器学习

Abstract:  
Abstract: Hospitalization costs are one of the factors that affect treatment choices and prognosis for breast cancer patients. Accurate prediction of hospitalization costs and personalized analysis of cost-influencing factors are crucial for efficient resource allocation and optimization of medical services. Addressing the issues of weak generalization ability and poor interpretability in single-model hospitalization cost prediction tasks, this paper proposes an interpretable stacking method. This method fully integrates the feature extraction capabilities of multiple models to achieve accurate prediction of hospitalization costs for breast cancer patients. The method employs a two-layer model fusion structure, where the first layer selects four base models and utilizes Bayesian optimization and five-fold cross validation techniques to optimize parameters, enhancing the predictive performance of each model. The final hospitalization cost prediction is then generated by the second-layer model. Additionally, this paper uses SHAP and LIME methods to analyze the results of breast cancer hospitalization cost predictions from the global and individual perspectives. The experimental results on a five-year dataset of breast cancer in patients from a certain hospital demonstrate that stacking method achieves an R2 metric of 0.877 in the cost prediction task, outperforming other related studies. The interpretable analysis indicates that length of stay and treatment method are the primary factors influencing overall costs, but there are variations in the influencing factors among different patients. This provides valuable insights for a deeper understanding of key factors affecting hospitalization costs.

Key words: Key words: hospitalization cost prediction, interpretable stacking method, SHAP, LIME, machine learning

中图分类号: