计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

改进的随机平衡采样Bagging算法的网络贷款研究

  

  1. (华南理工大学数学学院,广东广州510641)
  • 收稿日期:2018-12-29 出版日期:2019-04-26 发布日期:2019-04-30
  • 作者简介:郭冰楠(1992-),女,河南三门峡人,硕士研究生,研究方向:数据挖掘,机器学习,E-mail: bingnan_nn@163.com; 吴广潮(1972-),男,广东汕头人,副教授,硕士生导师,研究方向:数据挖掘,机器学习。

 Improved Random Balanced Sampling Bagging Algorithm for Network Loan Research

  1. (School of Mathematics, South China University of Technology, Guangzhou 510641, China)
  • Received:2018-12-29 Online:2019-04-26 Published:2019-04-30

摘要: 互联网金融中的网络贷款用户数据具有类别不平衡的特性,严重影响传统分类器的性能。随机平衡采样算法在对原始数据集进行重采样的过程中,将所有样本同等考虑,本文在平衡采样的过程中充分考虑样本点的性能,将其分为3类样本:安全的、边界的、噪声的,针对不同类型的样本采用相应的采样方法,得到平衡的新数据集,然后对该数据集进行Bagging集成,提高算法的泛化性能,结果表明本文改进的随机平衡采样(Improved Random Balanced Sampling, IRBS)Bagging算法可以较好地对网络贷款用户进行分类。

关键词: 类别不平衡, 随机平衡采样, Bagging集成

Abstract: The data of network loan users in Internet finance has the characteristics of class imbalance, which seriously affects the performance of traditional classifiers. The random balanced sampling algorithm considers all samples equally in the process of resampling the original data set. In this paper, the performance of the sample points is fully considered in the process of balanced sampling, and it is divided into three types of samples: safe, boundary, and noisy. The corresponding sampling method is adopted for different types of samples to obtain a balanced new data set, and then the Bagging integration of the data set is performed to improve the generalization performance of the algorithm. The results show that the Improved Random Balanced Sampling(IRBS) Bagging algorithm in this paper can better classify loan users.

Key words: category imbalance, random balanced sampling, Bagging integration

中图分类号: