Anomaly Detection of Network Traffic Based on t-SNE Dimensionality Reduction Preprocessing

Abstract

Abstract: Most of network traffic is normal, but abnormal traffic often deviates from normal range, which is mainly caused by malicious network behaviors such as DDOS attacks, penetration attacks, etc. These abnormal behaviors usually cause the network quality to decline and even cause the network to be paralyzed. Therefore, the prediction of network security situation is introduced, and the abnormality in the network is judged only when the normal network traffic is known. Anomaly detection is a method of predicting the security situation of a network to determine whether there are abnormalities in the network. Existing anomaly detection algorithms have poor performance due to their inability to accurately extract low-dimensional features of network packets. Therefore, it is necessary to find an accurate low-dimensional feature representation of network packets, which can distinguish whether the network packets are normal or attacked. Therefore, this paper introduces the NLOF anomaly detection algorithm based on t-SNE dimension reduction. The algorithm uses the t-SNE algorithm to automatically preprocess network packets to obtain low-dimensional network packet features, and then takes the obtained low-dimensional network packet features as input to the NLOF algorithm for anomaly detection. In detail, the step of the NLOF algorithm proposed in this paper is to first use the k-means algorithm to cluster network packets into K clusters, and mark the clusters with fewer than N network packets as abnormal clusters. After that, network packets that are not marked as abnormal clusters are used as input to the LOF algorithm for abnormal detection. The experimental results on the ISCX2012 dataset show that under the optimal performance of the t-SNE dimensionality-reduced LOF algorithm, the accuracy is 98.46%, the precision is 98.38%, the detection rate is 98.54% and the FAR is 066%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.18 percentage points, 0.02 percentage points and 0.01 percentage points, respectively. When the NLOF algorithm based on t-SNE dimension reduction achieves the optimal performance, the accuracy rate is 98.53%, the accuracy is 98.86%, the detection rate is 98.86% and the FAR is 0.32%. The proposed algorithm achieves the best performances regarding the accuracy, the detection rate and the F1 exceeding those of the other state-of-the-art algorithms by 3.25 percentage points, 0.34 percentage points and 0.41 percentage points, respectively. This is the first time in anomaly detection that the t-SNE algorithm is used to automatically extract low-dimensional network packet features. In addition, the LOF algorithm is only capable of capturing abnormal points, but the proposed NLOF algorithm can simultaneously capture abnormal points and abnormal clusters.

Key words: anomaly detection, network security situation prediction, training model using only normal network traffic, low-dimensional network packet features, t-SNE NLOF algorithm

HAO Yi-ran, SHENG Yi-qiang, WANG Jing-lin, . Anomaly Detection of Network Traffic Based on t-SNE Dimensionality Reduction Preprocessing[J]. Computer and Modernization, 2021, 0(02): 109-116.

References

［1］朱应武,杨家海,张金祥. 基于流量信息结构的异常检测［J］. 软件学报, 2010,21(10):2573-2583.
［2］卓勤政. 基于深度学习的网络流量分析研究［D］. 南京:南京理工大学, 2018.
［3］黎佳玥,赵波,李想,等. 基于深度学习的网络流量异常预测方法［J］. 计算机工程与应用, 2020,56(6):39-50.
［4］ WANG J, ROSSELL D, CASSANDRAS C G, et al. Network anomaly detection: A survey and comparative analysis of stochastic and deterministic methods［C］// Proceedings of the 52nd IEEE Conference on Decision and Control. 2013:182-187.
［5］ AHMED M, MAHMOOD A N, HU J K. A survey of network anomaly detection techniques［J］. Journal of Network & Computer Applications, 2016,60:19-31.
［6］连鸿飞,张浩,郭文忠. 一种数据增强与混合神经网络的异常流量检测［J］. 小型微型计算机系统, 2020,41(4):786-793.
［7］ CHANDOLA V, BANERJEE A, KUMAR V. Anomaly detection: A survey［J］. ACM Computing Surveys, 2009,41(3),DOI: 10.1145/1541880.1541882.
［8］ ZHANG S L, WAN J Q. Weight-based method for inside outlier detection［J］. Optik, 2018,154:145-156.
［9］ DING T Y, ZHANG M, HE D J. A network intrusion detection algorithm based on outlier mining［C］// Proceedings of the 2017 International Conference on Communications, Signal Processing, and Systems. 2017:1229-1236.
［10］BRAHMA A, PANIGRAHI S. Role of soft outlier analysis in database intrusion detection［M］// Advanced Computing and Intelligent Engineering. Springer, 2020:479-489.
［11］LIU F T, TING K M, ZHOU Z H. Isolation forest［C］// Proceedings of the 8th IEEE International Conference on Data Mining. 2008:413-422.
［12］李洋,郭莉,陆天波,等. TCM-KNN网络异常检测算法优化研究［J］. 通信学报, 2009,30(7):13-19.
［13］BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 2000:93-104
［14］HAO Y R, SHENG Y Q, WANG J L. Variant gated recurrent units with encoders to preprocess packets for payload-aware intrusion detection［J］. IEEE Access, 2019,7:49985-49998.
［15］LIU F T, TING K M, ZHOU Z H. Isolation-based anomaly detection［J］. ACM Transactions on Knowledge Discovery from Data, 2012,6(1),DOI: 10.145/2133360.2133363.
［16］WANG W, SHENG Y Q, WANG J L, et al. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection［J］. IEEE Access, 2017,6:1792-1806.
［17］VAN DER MAATEN L. Accelerating t-SNE using tree-based algorithms［J］. Journal of Machine Learning Research, 2014,15(1):3221-3245.
［18］TAVALLAEE M, BAGHERI E, LU W, et al. A detailed analysis of the KDD CUP 99 data set［C］// Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications. 2009:53-58.
［19］ASIA-LEE. NSL-KDD数据集介绍与下载［EB/OL］. (2018-07-06)［2020-01-30］. https://blog.csdn.net/asialee_bird/article/details/80937203.
［20］SHANBHOGUE R D, BEENA B M. Survey of data mining (DM) and machine learning (ML) methods on cyber security［J］. Indian Journal of Science and Technology, 2017,10(35):1-7.
［21］XU Q Y, ZHANG L. The effect of different hidden unit number of sparse autoencoder［C］// Proceedings of the 27th Chinese Control and Decision Conference. 2015:2464-2467.
［22］郝怡然,盛益强,王劲林,等. 基于递归神经网络的网络安全事件预测［J］. 网络新媒体技术, 2017,6(5):54-58.
［23］HAO Y R, SHENG Y Q, WANG J L. A graph representation learning algorithm for low-order proximity feature extraction to enhance unsupervised IDS preprocessing［J］. Applied Sciences, 2019,9(20), DOI: 10.3390/app9204473.
［24］ERFANI S M, RAJASEGARAR S, KARUNASEKERA S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning［J］. Pattern Recognition, 2016,58:121-134.
［25］鲍捷,牛颉,张勇,等. 物联网异常流量检测算法研究［J］. 信息技术与网络安全, 2019,38(2):17-20.
［26］ARYAL S, TING K M, WELLS J R, et al. Improving iForest with relative mass［C］// Proceedings of the 2014 Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2014:510-521.
［27］DU M, LI F F, ZHENG G N, et al. DeepLog: Anomaly detection and diagnosis from system logs through deep learning［C］// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017:1285-1298.
［28］GOLDSTEIN M, UCHIDA S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data［J］. PLoS One, 2016,11(4):e0152173.

[1]	LYU Meijing1, NIAN Mei1, ZHANG Jun1, 2, FU Lusen1. Anomaly Detection of Network Traffic Based on Autoencoder [J]. Computer and Modernization, 2024, 0(12): 40-44.
[2]	XUE Hao, MA Jing, GUO Xiaoyu. Water Supply Pipeline Burr Data Detection Based on Improved LightGBM by Focal Loss [J]. Computer and Modernization, 2024, 0(09): 74-81.
[3]	NONG Hao-cheng, REN De-jun, REN Qiu-lin, LIU Peng-li, HUANG De-cheng. Surface Anomaly Detection Algorithm of Flexible Plastic Packaging Based on Improved ConvNeXt [J]. Computer and Modernization, 2023, 0(08): 12-17.
[4]	YANG Jun, WANG Jin-lin, NI Hong, SHENG Yi-qiang, . Dynamic Transfer Method Based on Sensitivity in Industrial Control Network Anomaly Detection [J]. Computer and Modernization, 2023, 0(05): 46-51.
[5]	WANG Shi-yu, XIAO Li-dong, YAN Xin-chun, YING Wen-hao. Extended Isolated Forest Anomaly Detection Algorithm Based on Simulated Annealing [J]. Computer and Modernization, 2023, 0(01): 88-94.
[6]	SONG Xiao-li, ZHANG Yong-bo, ZHANG Pei-ying. Anomaly Detection of Student Consumption Data Based on Semi-supervised Learning [J]. Computer and Modernization, 2022, 0(12): 13-17.
[7]	YANG Yong-jiao, XIAO Jian-yi, ZHAO Chuang-ye, ZHOU Kai-dong. An Anomaly Detection Algorithm for Smart Grid Time Series Data #br# Based on Combination of Isolation Forest and Random Forest [J]. Computer and Modernization, 2020, 0(03): 99-.
[8]	YANG Yong-jiao, TANG Liang-liang. An Anomaly Detection Method for Network Traffic of Servers #br# in Smart Grid Based on Deep Encoder-Decoder Neural Network [J]. Computer and Modernization, 2019, 0(10): 66-.
[9]	YANG Yong-jiao， QIU Yu， ZHAN Li-chao. An Anomaly Detection Approach on Servers Traffic in Smart #br# Grid Based on Breadth Learning Algorithm [J]. Computer and Modernization, 2019, 0(09): 77-.
[10]	MA Chao1,2, CHENG Li1, KONG Ling-ling3. Performance Analysis of Traffic Anomaly Detection in Cloud-based Software-defined Network [J]. Computer and Modernization, 2015, 0(10): 92-97+102.
[11]	苏乐群，冯爱民. Anomaly Detection Based on Sparse Bayesian Regression [J]. Computer and Modernization, 2015, 0(1): 57-60.
[12]	WANG Wei, WANG Jian-dong, ZHANG Xia. An Anomaly Detection Method of Airport-noise Time Series #br# Based on Improved SAX Measurement [J]. Computer and Modernization, 2014, 0(8): 5-10.
[13]	CHEN Jiang;SHAN Gui-jun. Construction of Firewall Based on IPv4/IPv6 Transitional Stage [J]. Computer and Modernization, 2013, 1(9): 175-178.
[14]	CUI Yan-na. A Model of Anomaly Network Flow Detection [J]. Computer and Modernization, 2013, 1(8): 151-153.
[15]	CHEN Si;XU Su;JI Jia-qi. pplication Research on Data Mining Technique on Intrusion Detection System [J]. Computer and Modernization, 2009, 5(5): 114-0.