Computer and Modernization ›› 2025, Vol. 0 ›› Issue (02): 28-32.doi: 10.3969/j.issn.1006-2475.2025.02.004

Previous Articles     Next Articles

Resampling of Imbalanced Data for Optimizing Downstream Tasks 

  

  1. (College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China)
  • Online:2025-02-28 Published:2025-02-28

Abstract:  Data resampling is a key method for correcting imbalanced dataset. Traditional methods construct balanced samples by minimizing geometric errors in the sample space, but they perform poorly in high-dimensional space with complex distribution patterns. Moreover, relying on statistical features lacks specificity for downstream tasks. To address this issue, this paper presents Sampling for Optimizing Downstream Neural Network (SOD-NN), a neural network for data sampling. This approach utilizes the ability of neural networks for nonlinear processing to identify the distribution characteristics of high-dimensional samples. It combines with downstream tasks to create a two-stage network, enabling overall optimization, thereby enhancing the model’s capability to meet the requirements of downstream tasks effectively. Specifically, the dataset is first divided spatially during sampling. Residual processing of sample subsets is then applied to prevent data degradation. Subsequently, a self-attention mechanism is utilized to construct global feature, ensuring consistency with the original sample distribution. Experimental results indicate that the model proposed in this paper significantly improves the recognition performance of minority class samples in downstream classification tasks, enhancing the robustness of processing these tasks.

Key words: data resampling, imbalanced data, adaptive sampling network, self-attention mechanism

CLC Number: