Breast Cancer Prediction and Feature Analysis Model Based on CatBoost and SHAP
(1. School of Mathematics and Data Science, Changji University, Changji 831100, China; 2. College of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, China)
[1] 欧阳平,李小溪,冷芬,等. 机器学习算法在体检人群糖尿病风险预测中的应用[J]. 中华疾病控制杂志, 2021,25(7):849-853.
[2] 李仪,林建君,朱习军. 基于改进DNN的糖尿病预测模型设计[J]. 计算机工程与设计, 2021,42(5):1418-1424.
[3] 李军,胡晓娟,屠立平,等. 基于舌象参数与多指标特征联合的2型糖尿病风险预测模型[J]. 中国中医基础医学杂志, 2021,27(3):451-456.
[4] 孟辉,张加宏,李敏,等. 基于IPSO-BP神经网络与BCG信号的冠心病预测分类研究[J]. 传感技术学报, 2020,33(10):1379-1385.
[5] 蒋林甫,袁贞明,张邢炜,等. 基于PCHD-TabNet的十年冠心病预测[J]. 数据分析与知识发现, 2023,7(5):133-144.
[6] LIAO B, JIA X Y, ZHANG T, et al. DHDIP: An interpretable model for hypertension and hyperlipidemia prediction based on EMR data[J]. Computer Methods and Programs in Biomedicine, 2022,226. DOI: 10.1016/j.cmpb.2022.107088.
[7] 谢爽,范会敏. 基于Word2vec和卷积神经网络特征提取的双高疾病预测[J]. 计算机应用与软件, 2021,38(2):93-96.
[8] 董章功,宋波,孟友新. 基于SEIR-ARIMA混合模型的新冠肺炎预测[J]. 计算机与现代化, 2022(2):1-6.
[9] 朱岩,张利,王煜. 基于RoBERTa-WWM的中文电子病历命名实体识别[J]. 计算机与现代化, 2021(2):51-55.
[10] 徐子晨,范子祥,王文倩,等. 人参皂苷抗乳腺癌分子机制的研究进展[J]. 中草药, 2022,53(20):6601-6610.
[11] SUNG H, FERLAY J, SIEGEL R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA: A Cancer Journal for Clinicians, 2021,71(3):209-249.
[12] ZHANG X T, ZHANG Y, GAO H R, et al. A wrapper feature selection algorithm based on brain storm optimization[C]// Proceedings of the 2018 International Conference on Bio-inspired Computing: Theories and Applications. 2018:308-315.
[13] ABDEL-BASSET M, EL-SHAHAT D, EL-HENAWY I, et al. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection[J]. Expert Systems with Applications, 2020,139. DOI: 10.1016/j.eswa.
2019.112824.
[14] WANG H F, ZHENG B C, YOON S W, et al. A support vector machine-based ensemble algorithm for breast cancer diagnosis[J]. European Journal of Operational Research, 2018,267(2):687-699.
[15] RAO H D, SHI X Z, RODRIGUE A K, et al. Feature selection based on artificial bee colony and gradient boosting decision tree[J]. Applied Soft Computing, 2019,74:634-642.
[16] NAJI M A, EL FILALI S, AARIKA K, et al. Machine learning algorithms for breast cancer prediction and diagnosis[J]. Procedia Computer Science, 2021,191:487-492.
[17] MISHRA A K, ROY P, BANDYOPADHYAY S. Binary particle swarm optimization based feature selection (BPSO-FS) for improving breast cancer prediction[C]// Proceedings of the 2021 International Conference on Artificial Intelligence and Applications. 2021:373-374.
[18] AGARAP A F M. Deep learning using rectified linear units (ReLU)[J]. arXiv preprint arXiv:1803.08375, 2018.
[19] LIU N, QI E S, XU M, et al. A novel intelligent classification model for breast cancer diagnosis[J]. Information Processing & Management, 2019,56(3):609-623.
[20] SINGH D, SINGH B, KAUR M. Simultaneous feature weighting and parameter determination of neural networks using ant lion optimization for the classification of breast cancer[J]. Biocybernetics and Biomedical Engineering, 2020,40(1):337-351.
[21] 甘丹. 面向多模态数据的医疗与健康决策支持研究[D]. 天津:天津大学, 2020.
[22] WOLBERG W, MANGASARIAN O, STREET N, et al. Breast cancer Wisconsin (diagnostic)[EB/OL]. (1995-10-31)[2022-10-12]. https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic.
[23] WOLBERG W. Breast cancer Wisconsin (original)[EB/OL]. (1992-07-14)[2022-10-12]. http://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original.
[24] PATRICIO M, PEREIRA J, CRISOSTOMO J, et al. Using resistin, glucose, age and BMI to predict the presence of breast cancer[J]. BMC Cancer, 2018,18(1). DOI: 10.1186
/s12885-017-3877-1.
[25] PROKHORENKOVA L, GUSEV G, VOROBEV A, et al. CatBoost: Unbiased boosting with categorical features[J]. arXiv preprint arXiv:1706.09516, 2017.
[26] DOROGUSH A V, ERSHOV V, GULIN A. CatBoost: Gradient boosting with categorical features support[J]. arXiv preprint arXiv:1810.11363, 2018.
[27] FRIEDMAN J H. Greedy function approximation: A gradient boosting machine[J]. The Annals of Statistics, 2001,29(5):1189-1232.
[28] LUNDBERG S, LEE S I. A unified approach to interpreting model predictions[J]. arXiv preprint arXiv:1705.07874, 2017.