计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于SMOTE和XGBoost的贷款风险预测方法

  

  1. (陕西科技大学电子信息与人工智能学院,陕西西安710021)
  • 收稿日期:2019-06-11 出版日期:2020-03-03 发布日期:2020-03-03
  • 作者简介:刘斌(1972-),男,河南孟县人,副教授,硕士生导师,研究方向:数据挖掘,大数据分析,E-mail: Liubin@sust.edu.cn; 陈凯(1995-),男,山西忻州人,硕士研究生,研究方向:大数据分析,计算机应用,E-mail: 188367480@qq.com。
  • 基金资助:
    国家自然科学基金资助项目(61871260)

Loan Risk Prediction Method Based on SMOTE and XGBoost

  1. (School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an 710021, China) 
  • Received:2019-06-11 Online:2020-03-03 Published:2020-03-03

摘要: 近年来,随着在线信贷的飞速发展,贷款总量不断加大,违约概率不断提升。因此对贷款风险进行深入研究,对在线信贷企业预防互联网金融风险是非常具有现实意义的。针对贷款数据非平衡分布、大量噪声、维度高的问题,本文提出一种基于SMOTE和XGBoost的贷款风险预测方法。通过特征工程对数据进行降维和去噪;针对数据的非平衡问题,使用SMOTE算法进行过采样,平衡正负样本数目;基于以上工作,构建XGBoost分类模型,与一些传统分类算法进行对比,然后对比在不同正负样本比例时,预测结果的有效性。实验表明,相比于传统分类模型,XGBoost算法在贷款风险预测模型中具有更好的效果,通过SMOTE算法增加少数类样本的比例可以提高预测结果的有效性。

关键词: 贷款风险, 特征工程, SMOTE算法, XGBoost

Abstract: In recent years, the rapid development of online credit loan results in both continuous growth of total amount of loan and the continuous rise of probability of default. Therefore, it is of great practical significance for online credit enterprises to prevent the risk of Internet finance by studying the risk of loan. Aiming at loan-related problems including the non-balanced distribution, a large number of noise, and high dimension, a loan risk prediction method based on SMOTE and XGBoost is proposed. Through the feature engineering, the dimensionality reduction and denoising of the data set are realized. For the non-equilibrium problem of the data, the SMOTE algorithm is used to oversample the number of positive and negative samples. Based on above-mentioned work, this paper builds an XGBoost classification model, compares it with some traditional classification algorithms, and conducts comparison of validity of the prediction results under different positive and negative sample proportions. The experiment shows that XGBoost algorithm has better effect in loan risk prediction model in comparison with traditional classification models, and the increase of the proportion of minority samples through the use of SMOTE algorithm can improve the effectiveness of prediction results.

Key words: loan risk, feature engineering, SMOTE algorithm, XGBoost

中图分类号: