基于t-SNE降维预处理的网络流量异常检测

摘要/Abstract

摘要： 网络流量中大多数流量都是正常的，但经常会出现偏离正常范围的异常流量，主要由DDOS攻击、渗透攻击等恶意的网络行为引起，这些异常行为通常会导致网络质量下降，甚至网络直接瘫痪。因此引入网络安全态势的预测，在仅知道正常网络流量的情况下判断网络中的异常。异常检测是一种网络安全态势的预测方法，用来判断网络中是否有异常。现有的异常检测算法由于无法准确提取网络数据包的低维特征导致算法的性能不佳，因此，需要找到网络数据包的准确的低维特征表示，该低维特征表示能够区分网络数据包是正常的还是有攻击的。为此，本文引入基于t-SNE降维的NLOF异常检测算法。该算法采用t-SNE算法自动预处理网络数据包以获得低维的网络数据包特征，之后将得到的低维的网络数据包特征作为NLOF算法的输入进行异常检测。其中，本文的NLOF算法首先采用k-means算法将网络数据包聚类成为K个簇，并将网络数据包数量小于N个的簇标记为异常簇，之后将未被标记为异常簇的网络数据包作为LOF算法的输入进行异常检测。在ISCX2012数据集上的实验结果表明，基于t-SNE降维的LOF算法达到最优性能时，准确率为98.46%，精确度为98.38%，检测率为98.54%，FAR为0.66%。该算法比基于现有最新算法的准确率、检测率和F1分别高3.18个百分点、0.02个百分点和0.01个百分点。基于t-SNE降维的NLOF算法达到最优性能时，准确率为98.53%，精确度为98.86%，检测率为98.86%，FAR为0.32%。该算法比基于现有最新算法的准确率、检测率和F1分别高3.25个百分点、0.34个百分点和0.41个百分点。这是异常检测中首次采用t-SNE算法自动提取低维的网络数据包特征。此外，LOF算法仅能捕获异常点，而本文的NLOF算法能够同时捕获异常点和异常簇。

关键词: 异常检测, 网络安全态势预测, 仅使用正常网络流量训练模型, 低维网络数据包特征, t-SNE NLOF算法

Abstract: Most of network traffic is normal, but abnormal traffic often deviates from normal range, which is mainly caused by malicious network behaviors such as DDOS attacks, penetration attacks, etc. These abnormal behaviors usually cause the network quality to decline and even cause the network to be paralyzed. Therefore, the prediction of network security situation is introduced, and the abnormality in the network is judged only when the normal network traffic is known. Anomaly detection is a method of predicting the security situation of a network to determine whether there are abnormalities in the network. Existing anomaly detection algorithms have poor performance due to their inability to accurately extract low-dimensional features of network packets. Therefore, it is necessary to find an accurate low-dimensional feature representation of network packets, which can distinguish whether the network packets are normal or attacked. Therefore, this paper introduces the NLOF anomaly detection algorithm based on t-SNE dimension reduction. The algorithm uses the t-SNE algorithm to automatically preprocess network packets to obtain low-dimensional network packet features, and then takes the obtained low-dimensional network packet features as input to the NLOF algorithm for anomaly detection. In detail, the step of the NLOF algorithm proposed in this paper is to first use the k-means algorithm to cluster network packets into K clusters, and mark the clusters with fewer than N network packets as abnormal clusters. After that, network packets that are not marked as abnormal clusters are used as input to the LOF algorithm for abnormal detection. The experimental results on the ISCX2012 dataset show that under the optimal performance of the t-SNE dimensionality-reduced LOF algorithm, the accuracy is 98.46%, the precision is 98.38%, the detection rate is 98.54% and the FAR is 066%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.18 percentage points, 0.02 percentage points and 0.01 percentage points, respectively. When the NLOF algorithm based on t-SNE dimension reduction achieves the optimal performance, the accuracy rate is 98.53%, the accuracy is 98.86%, the detection rate is 98.86% and the FAR is 0.32%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.25 percentage points, 0.34 percentage points and 0.41 percentage points, respectively. This is the first time in anomaly detection that the t-SNE algorithm is used to automatically extract low-dimensional network packet features. In addition, the LOF algorithm is only capable of capturing abnormal points, but the proposed NLOF algorithm can simultaneously capture abnormal points and abnormal clusters.

Key words: anomaly detection, network security situation prediction, training model using only normal network traffic, low-dimensional network packet features, t-SNE NLOF algorithm

郝怡然, 盛益强, 王劲林, . 基于t-SNE降维预处理的网络流量异常检测[J]. 计算机与现代化, 2021, 0(02): 109-116.

HAO Yi-ran, SHENG Yi-qiang, WANG Jing-lin, . Anomaly Detection of Network Traffic Based on t-SNE Dimensionality Reduction Preprocessing[J]. Computer and Modernization, 2021, 0(02): 109-116.

参考文献

［1］朱应武,杨家海,张金祥. 基于流量信息结构的异常检测［J］. 软件学报, 2010,21(10):2573-2583.
［2］卓勤政. 基于深度学习的网络流量分析研究［D］. 南京:南京理工大学, 2018.
［3］黎佳玥,赵波,李想,等. 基于深度学习的网络流量异常预测方法［J］. 计算机工程与应用, 2020,56(6):39-50.
［4］ WANG J, ROSSELL D, CASSANDRAS C G, et al. Network anomaly detection: A survey and comparative analysis of stochastic and deterministic methods［C］// Proceedings of the 52nd IEEE Conference on Decision and Control. 2013:182-187.
［5］ AHMED M, MAHMOOD A N, HU J K. A survey of network anomaly detection techniques［J］. Journal of Network & Computer Applications, 2016,60:19-31.
［6］连鸿飞,张浩,郭文忠. 一种数据增强与混合神经网络的异常流量检测［J］. 小型微型计算机系统, 2020,41(4):786-793.
［7］ CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection: A survey［J］. ACM Computing Surveys, 2009,41(3),DOI: 10.1145/1541880.1541882.
［8］ ZHANG S L, WAN J Q. Weight-based method for inside outlier detection［J］. Optik, 2018,154:145-156.
［9］ DING T Y, ZHANG M, HE D J. A network intrusion detection algorithm based on outlier mining［C］// Proceedings of the 2017 International Conference on Communications, Signal Processing, and Systems. 2017:1229-1236.
［10］BRAHMA A, PANIGRAHI S. Role of soft outlier analysis in database intrusion detection［M］// Advanced Computing and Intelligent Engineering. Springer, 2020:479-489.
［11］LIU F T, TING K M, ZHOU Z H. Isolation forest［C］// Proceedings of the 8th IEEE International Conference on Data Mining. 2008:413-422.
［12］李洋,郭莉,陆天波,等. TCM-KNN网络异常检测算法优化研究［J］. 通信学报, 2009,30(7):13-19.
［13］BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 2000:93-104
［14］HAO Y R, SHENG Y Q, WANG J L. Variant gated recurrent units with encoders to preprocess packets for payload-aware intrusion detection［J］. IEEE Access, 2019,7:49985-49998.
［15］LIU F T, TING K M, ZHOU Z H. Isolation-based anomaly detection［J］. ACM Transactions on Knowledge Discovery from Data, 2012,6(1),DOI: 10.145/2133360.2133363.
［16］WANG W, SHENG Y Q, WANG J L, et al. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection［J］. IEEE Access, 2017,6:1792-1806.
［17］VAN DER MAATEN L. Accelerating t-SNE using tree-based algorithms［J］. Journal of Machine Learning Research, 2014,15(1):3221-3245.
［18］TAVALLAEE M, BAGHERI E, LU W, et al. A detailed analysis of the KDD CUP 99 data set［C］// Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications. 2009:53-58.
［19］ASIA-LEE. NSL-KDD数据集介绍与下载［EB/OL］. (2018-07-06)［2020-01-30］. https://blog.csdn.net/asialee_bird/article/details/80937203.
［20］SHANBHOGUE R D, BEENA B M. Survey of data mining (DM) and machine learning (ML) methods on cyber security［J］. Indian Journal of Science and Technology, 2017,10(35):1-7.
［21］XU Q Y, ZHANG L. The effect of different hidden unit number of sparse autoencoder［C］// Proceedings of the 27th Chinese Control and Decision Conference. 2015:2464-2467.
［22］郝怡然,盛益强,王劲林,等. 基于递归神经网络的网络安全事件预测［J］. 网络新媒体技术, 2017,6(5):54-58.
［23］HAO Y R, SHENG Y Q, WANG J L. A graph representation learning algorithm for low-order proximity feature extraction to enhance unsupervised IDS preprocessing［J］. Applied Sciences, 2019,9(20), DOI: 10.3390/app9204473.
［24］ERFANI S M, RAJASEGARAR S, KARUNASEKERA S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning［J］. Pattern Recognition, 2016,58:121-134.
［25］鲍捷,牛颉,张勇,等. 物联网异常流量检测算法研究［J］. 信息技术与网络安全, 2019,38(2):17-20.
［26］ARYAL S, TING K M, WELLS J R, et al. Improving iForest with relative mass［C］// Proceedings of the 2014 Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2014:510-521.
［27］DU M, LI F F, ZHENG G N, et al. DeepLog: Anomaly detection and diagnosis from system logs through deep learning［C］// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017:1285-1298.
［28］GOLDSTEIN M, UCHIDA S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data［J］. PLoS One, 2016,11(4):e0152173.

[1]	吕美静1, 年梅1, 张俊1, 2, 付鲁森1. 基于自编码器的网络流量异常检测[J]. 计算机与现代化, 2024, 0(12): 40-44.
[2]	薛浩, 马静, 郭小宇. 基于Focal Loss改进LightGBM的供水管网毛刺数据检测[J]. 计算机与现代化, 2024, 0(09): 74-81.
[3]	农皓程, 任德均, 任秋霖, 刘澎笠, 黄德成. 基于改进ConvNeXt的软塑包装表面异常检测算法[J]. 计算机与现代化, 2023, 0(08): 12-17.
[4]	杨骏, 王劲林, 倪宏, 盛益强, . 工控网络异常检测中基于灵敏度的动态迁移算法[J]. 计算机与现代化, 2023, 0(05): 46-51.
[5]	白开峰, 赵宏斌, 张芸, 李妍, 崔静安, 刘千金, 杨华, 倪娜. 电网异常业务数据检测方法综述[J]. 计算机与现代化, 2023, 0(03): 79-83.
[6]	王诗愉, 肖利东, 严心淳, 应文豪. 基于模拟退火的扩展孤立森林异常检测算法[J]. 计算机与现代化, 2023, 0(01): 88-94.
[7]	宋晓丽, 张勇波, 张培颖. 基于半监督学习的学生消费数据异常检测[J]. 计算机与现代化, 2022, 0(12): 13-17.
[8]	顾国庆, 李晓辉. 基于箱线图异常检测的指数加权平滑预测模型[J]. 计算机与现代化, 2021, 0(01): 28-33.
[9]	杨永娇，肖建毅，赵创业，周开东. 基于Isolation Forest和Random Forest相结合的智能电网时间序列数据异常检测算法[J]. 计算机与现代化, 2020, 0(03): 99-.
[10]	杨永娇,唐亮亮. 一种基于深度Encoder-Decoder神经网络的智能#br# 电网数据服务器流量异常检测算法[J]. 计算机与现代化, 2019, 0(10): 66-.
[11]	杨永娇，邱宇，占力超. 基于宽度学习的智能电网数据服务器流量异常检测算法[J]. 计算机与现代化, 2019, 0(09): 77-.
[12]	马超1,2，程力1，孔玲玲3. 云环境下SDN的流量异常检测性能分析[J]. 计算机与现代化, 2015, 0(10): 92-97+102.
[13]	苏乐群，冯爱民. 基于稀疏贝叶斯回归的异常检测[J]. 计算机与现代化, 2015, 0(1): 57-60.
[14]	王伟，王建东，张霞. 基于改进符号化度量方法的机场噪声异常检测[J]. 计算机与现代化, 2014, 0(8): 5-10.
[15]	陈江;单桂军. 基于IPv4/IPv6过渡的防火墙构建[J]. 计算机与现代化, 2013, 1(9): 175-178.