Computer and Modernization

Previous Articles     Next Articles

Research on Classification of Improved Smote Algorithm on Imbalanced Datasets

  

  1. (1. School of Internet of Things, Jiangnan University, Wuxi 214122, China; 
    2. School of Business, Jiangnan University, Wuxi 214122, China; 
    3. Food Safety Risk Management Institute, Jiangnan University, Wuxi 214122, China)
  • Received:2017-09-13 Online:2018-04-03 Published:2018-04-03

Abstract: In imbalanced datasets, the oversampling algorithm, such as Smote (Synthetic Minority Oversampling) algorithm, R-Smote algorithm and SD-ISmote algorithm, may blur the boundary between the majority and the minority and use noisy data to synthesize new samples. The ImprovedSmote algorithm proposed in this paper uses cluster center of minority set and their corresponding minority set to generate new samples. The Smote, the R-Smote, the SD-ISmote and the ImprovedSmote algorithm combined C4.5 decision tree and neural network algorithm are used on the experimental datasets. The results show that the ImprovedSmote algorithm is better than other algorithms in classification and can effectively improve classifier performance.

Key words: imbalanced dataset, Smote, R-Smote, SD-ISmote, ImprovedSmote, cluster center

CLC Number: