计算机与现代化 ›› 2022, Vol. 0 ›› Issue (04): 12-16.

• 人工智能 • 上一篇    下一篇

基于集成学习与不平衡数据的返贫预测

  

  1. (五邑大学数学与计算科学学院,广东江门529020)

  • 出版日期:2022-05-07 发布日期:2022-05-07
  • 作者简介:龚云翔(2000—),男(回族),河南漯河人,本科生,研究方向:数据挖掘,E-mail: gyx18239583806@163.com; 通信作者:袁仕芳(1972—),男,湖南洞口人,教授,博士,研究方向:数据处理和数值分析,E-mail: yuanshifang305@163.com; 刘付谦(2000—),男,广东东莞人,本科生,研究方向:数据挖掘,E-mail: 1399766572@qq.com。
  • 基金资助:
    广东省普通高校特色创新项目(2019KQNCX156); 五邑大学港澳联合研发基金资助项目(2019WGALH20); 2020年国家级大学生创新创业训练计划项目(202011349016); 2018年五邑大学教学质量工程与教学改革项目(JX2018024)

Poverty-returning Prediction Based on Ensemble Learning and Unbalanced Data

  1. (School of Mathematics and Computational Science, Wuyi University, Jiangmen 529020, China)
  • Online:2022-05-07 Published:2022-05-07

摘要: 中国在扶贫工作取得决定性成就的同时,仍有一些脱贫人口存在返贫风险。本文基于不平衡数据集,利用SMOTE模型对返贫类别样本进行过采样处理,处理后的返贫与未返贫样本数据比例为3:1;接着建立基于Stacking集成学习的返贫预测模型,利用网格搜索对各个模型超参数进行寻优,结合10折交叉验证提高模型的泛化能力。本文使用4种不同的融合模型对脱贫户是否返贫进行预测。实验结果表明,与单一模型相比,模型融合后的分类效果要优于单独的分类器,其中最优融合模型的Acc为0.962,F1-score为0.946。

关键词: 返贫预测, 过采样技术, 集成学习, 融合模型

Abstract: While China has made the decisive achievement on working on poverty alleviation, there are still some people out of poverty who exist risk of returning to poverty. Based on the unbalanced data set, this paper used the model of SMOTE to do sampling process for multi-class samples of returning to poverty. The sample’s ratio of returned to poverty and non-returned to poverty is 3〖DK〗∶1. After that, based on ensemble learning of Stacking, this paper constructed a prediction model of poverty-returning, used grid search to optimize hyper parameters of every model and improved the generalization ability by combining the 10-fold cross-validation. In this paper, four different integration models are used to predict whether the poor households will return to poverty. Compared with the single model, the experiments indicate that the classification results with fusion model are better. Among them, the optimal Acc and F1-score of fusion model are 0.962 and 0.946.

Key words: prediction of poverty-returning, SMOTE, ensemble learning, fusion model