Computer and Modernization ›› 2020, Vol. 0 ›› Issue (05): 34-.doi: 10.3969/j.issn.1006-2475.2020.05.006

Previous Articles     Next Articles

An Imbalanced Data Classification of Hybrid Sampling Based on Clustering

  

  1. (School of Mathematics, South China University of Technology, Guangzhou 510641, China)
  • Received:2019-09-17 Online:2020-05-20 Published:2020-05-21

Abstract: The imbalanced classification problem is widely used in real life. For most resampling algorithms, it focuses on the balance between classes and pays less attention to the problem of data distribution imbalance within classes, a hybrid sampling algorithm based on clustering is proposed. Firstly, the original data set is clustered, then the imbalance ratio is calculated for each cluster sample, and the cluster sample is processed according to the imbalance ratio. Finally, the balanced data set is put into the GBDT classifier for training. Experiments show that the algorithm has higher F1-value, AUC and better classification results than several traditional algorithms.

Key words: imbalanced data, clustering, hybrid sampling, GBDT

CLC Number: