Computer and Modernization ›› 2021, Vol. 0 ›› Issue (02): 109-116.

Previous Articles     Next Articles

Anomaly Detection of Network Traffic Based on t-SNE Dimensionality Reduction Preprocessing

  

  1. (1. National Network New Media Engineering Research Center, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;

    2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China)
  • Online:2021-03-01 Published:2021-03-01

Abstract: Most of network traffic is normal, but abnormal traffic often deviates from normal range, which is mainly caused by malicious network behaviors such as DDOS attacks, penetration attacks, etc. These abnormal behaviors usually cause the network quality to decline and even cause the network to be paralyzed. Therefore, the prediction of network security situation is introduced, and the abnormality in the network is judged only when the normal network traffic is known. Anomaly detection is a method of predicting the security situation of a network to determine whether there are abnormalities in the network. Existing anomaly detection algorithms have poor performance due to their inability to accurately extract low-dimensional features of network packets. Therefore, it is necessary to find an accurate low-dimensional feature representation of network packets, which can distinguish whether the network packets are normal or attacked. Therefore, this paper introduces the NLOF anomaly detection algorithm based on t-SNE dimension reduction. The algorithm uses the t-SNE algorithm to automatically preprocess network packets to obtain low-dimensional network packet features, and then takes the obtained low-dimensional network packet features as input to the NLOF algorithm for anomaly detection. In detail, the step of the NLOF algorithm proposed in this paper is to first use the k-means algorithm to cluster network packets into K clusters, and mark the clusters with fewer than N network packets as abnormal clusters. After that, network packets that are not marked as abnormal clusters are used as input to the LOF algorithm for abnormal detection. The experimental results on the ISCX2012 dataset show that under the optimal performance of the t-SNE dimensionality-reduced LOF algorithm, the accuracy is 98.46%, the precision is 98.38%, the detection rate is 98.54% and the FAR is 066%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.18 percentage points, 0.02 percentage points and 0.01 percentage points, respectively. When the NLOF algorithm based on t-SNE dimension reduction achieves the optimal performance, the accuracy rate is 98.53%, the accuracy is 98.86%, the detection rate is 98.86% and the FAR is 0.32%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.25 percentage points, 0.34 percentage points and 0.41 percentage points, respectively. This is the first time in anomaly detection that the t-SNE algorithm is used to automatically extract low-dimensional network packet features. In addition, the LOF algorithm is only capable of capturing abnormal points, but the proposed NLOF algorithm can simultaneously capture abnormal points and abnormal clusters.

Key words: anomaly detection, network security situation prediction, training model using only normal network traffic, low-dimensional network packet features, t-SNE NLOF algorithm