Computer and Modernization ›› 2020, Vol. 0 ›› Issue (06): 73-.

Previous Articles     Next Articles

High-dimensional Numerical Anomaly Data Detection Based on Multi-level Sequence Integration

  

  1. (1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;
    2. Information and Communication Company of State Grid Shanghai Electric Power Company, Shanghai 200000, China)
  • Received:2019-10-22 Online:2020-06-24 Published:2020-06-28

Abstract: With the rapid development of big data, data analysis and knowledge discovery have become research hotspots, and anomaly data detection is the key to data quality improvement. The abnormal data detection method based on sequence ensemble learning may cause large deviations in the detection of abnormal data in high-dimensional numerical data due to noise data and excessive number of dimensions. This paper proposes a high-dimensional numerical anomaly data detection model of multi-layer sequence ensemble learning based on elastic network. Each layer contains three modules: abnormal data candidate set module, elastic network dimension reduction module and data abnormality scoring module. First, the abnormal data candidate set selection module selects some possible abnormal data according to abnormal score. Then, the elastic network reduces the dimension of data according to the outlier candidate set and its abnormal score. Finally, the selected features related to the abnormal score are used to score the data again. The threshold in each layer of the abnormal data candidate set selection module is set to a different value, and each layer is executed cyclically until the mean square error of the current elastic network is greater than the previous or the current detection precision is smaller than the initial detection precision. In the experimental stage, the high-dimensional anomaly data set provided by ODDS is used to test the performance of the model proposed in this paper based on the detection accuracy, the number of extracted features, the convergence speed, etc. The results show that the proposed method can not only improve the detection accuracy of high-dimensional numerical anomaly data, but also effectively reduce the effect of noise on the detection results.

Key words: data mining, abnormal data detecting, ensemble learning, elastic network, high-dimensional data

CLC Number: