计算机与现代化 ›› 2021, Vol. 0 ›› Issue (02): 109-116.

• 信息安全 • 上一篇    下一篇

基于t-SNE降维预处理的网络流量异常检测

  

  1. (1.中国科学院声学研究所国家网络新媒体工程技术研究中心,北京100190;
    2.中国科学院大学电子电气与通信工程学院,北京100049)
  • 出版日期:2021-03-01 发布日期:2021-03-01
  • 作者简介:郝怡然(1993—),女,内蒙古包头人,博士研究生,研究方向:网络安全态势预测,异常检测,深度学习,E-mail: haoyr@dsp.ac.cn; 盛益强(1978—),男,浙江金华人,研究员,硕士生导师,博士,研究方向:未来网络,网络安全态势预测,E-mail: shengyq@dsp.ac.cn; 王劲林(1964—),男,天津人,研究员,博士生导师,硕士,研究方向:未来网络,网络安全态势预测,E-mail: wangjl@dsp.ac.cn。
  • 基金资助:
    中国科学院战略性先导科技专项课题(XDC02020400)

Anomaly Detection of Network Traffic Based on t-SNE Dimensionality Reduction Preprocessing

  1. (1. National Network New Media Engineering Research Center, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;

    2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China)
  • Online:2021-03-01 Published:2021-03-01

摘要: 网络流量中大多数流量都是正常的,但经常会出现偏离正常范围的异常流量,主要由DDOS攻击、渗透攻击等恶意的网络行为引起,这些异常行为通常会导致网络质量下降,甚至网络直接瘫痪。因此引入网络安全态势的预测,在仅知道正常网络流量的情况下判断网络中的异常。异常检测是一种网络安全态势的预测方法,用来判断网络中是否有异常。现有的异常检测算法由于无法准确提取网络数据包的低维特征导致算法的性能不佳,因此,需要找到网络数据包的准确的低维特征表示,该低维特征表示能够区分网络数据包是正常的还是有攻击的。为此,本文引入基于t-SNE降维的NLOF异常检测算法。该算法采用t-SNE算法自动预处理网络数据包以获得低维的网络数据包特征,之后将得到的低维的网络数据包特征作为NLOF算法的输入进行异常检测。其中,本文的NLOF算法首先采用k-means算法将网络数据包聚类成为K个簇,并将网络数据包数量小于N个的簇标记为异常簇,之后将未被标记为异常簇的网络数据包作为LOF算法的输入进行异常检测。在ISCX2012数据集上的实验结果表明,基于t-SNE降维的LOF算法达到最优性能时,准确率为98.46%,精确度为98.38%,检测率为98.54%,FAR为0.66%。该算法比基于现有最新算法的准确率、检测率和F1分别高3.18个百分点、0.02个百分点和0.01个百分点。基于t-SNE降维的NLOF算法达到最优性能时,准确率为98.53%,精确度为98.86%,检测率为98.86%,FAR为0.32%。该算法比基于现有最新算法的准确率、检测率和F1分别高3.25个百分点、0.34个百分点和0.41个百分点。这是异常检测中首次采用t-SNE算法自动提取低维的网络数据包特征。此外,LOF算法仅能捕获异常点,而本文的NLOF算法能够同时捕获异常点和异常簇。

关键词: 异常检测, 网络安全态势预测, 仅使用正常网络流量训练模型, 低维网络数据包特征, t-SNE NLOF算法

Abstract: Most of network traffic is normal, but abnormal traffic often deviates from normal range, which is mainly caused by malicious network behaviors such as DDOS attacks, penetration attacks, etc. These abnormal behaviors usually cause the network quality to decline and even cause the network to be paralyzed. Therefore, the prediction of network security situation is introduced, and the abnormality in the network is judged only when the normal network traffic is known. Anomaly detection is a method of predicting the security situation of a network to determine whether there are abnormalities in the network. Existing anomaly detection algorithms have poor performance due to their inability to accurately extract low-dimensional features of network packets. Therefore, it is necessary to find an accurate low-dimensional feature representation of network packets, which can distinguish whether the network packets are normal or attacked. Therefore, this paper introduces the NLOF anomaly detection algorithm based on t-SNE dimension reduction. The algorithm uses the t-SNE algorithm to automatically preprocess network packets to obtain low-dimensional network packet features, and then takes the obtained low-dimensional network packet features as input to the NLOF algorithm for anomaly detection. In detail, the step of the NLOF algorithm proposed in this paper is to first use the k-means algorithm to cluster network packets into K clusters, and mark the clusters with fewer than N network packets as abnormal clusters. After that, network packets that are not marked as abnormal clusters are used as input to the LOF algorithm for abnormal detection. The experimental results on the ISCX2012 dataset show that under the optimal performance of the t-SNE dimensionality-reduced LOF algorithm, the accuracy is 98.46%, the precision is 98.38%, the detection rate is 98.54% and the FAR is 066%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.18 percentage points, 0.02 percentage points and 0.01 percentage points, respectively. When the NLOF algorithm based on t-SNE dimension reduction achieves the optimal performance, the accuracy rate is 98.53%, the accuracy is 98.86%, the detection rate is 98.86% and the FAR is 0.32%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.25 percentage points, 0.34 percentage points and 0.41 percentage points, respectively. This is the first time in anomaly detection that the t-SNE algorithm is used to automatically extract low-dimensional network packet features. In addition, the LOF algorithm is only capable of capturing abnormal points, but the proposed NLOF algorithm can simultaneously capture abnormal points and abnormal clusters.

Key words: anomaly detection, network security situation prediction, training model using only normal network traffic, low-dimensional network packet features, t-SNE NLOF algorithm