Computer and Modernization ›› 2021, Vol. 0 ›› Issue (05): 93-97.

Previous Articles     Next Articles

Application System Identification Method Oriented to Unbalanced Datasets

  

  1. (1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, China;
    2. Information Technology Research Office, Shengli Oil Field Geophysical Research Institute, Dongying 257022, China)
  • Online:2021-06-03 Published:2021-06-03

Abstract: Aiming at the problem that traditional flow-based analysis methods cannot achieve effective identification of application systems in the oilfield local area network environment, this paper designs an application system identification framework for imbalanced data sets, WEBCLA, which uses the improved SMOTE algorithm based on Gini gain (GSMOTE) combined with the XGBoost classification algorithm to effectively identify web-based application systems. Specifically, the GSMOTE algorithm proposed by this paper over-samples the minority classes to effectively alleviate the problem of imbalance in recognition samples, and combines the XGBoost classification algorithm to identify the application system. Through experiments on real data sets, the results show that the method proposed in this paper has a significant improvement in recall rate compared with the traditional method, which is about 112.8% higher than the ordinary integrated method, and about 10.8% higher than the method without sampling processing. It can effectively solve the application system identification problem in the oil field LAN.

Key words: application recognition, unbalanced data, Gini gain, over sampling, classification problem