计算机与现代化 ›› 2024, Vol. 0 ›› Issue (12): 40-44.doi: 10.3969/j.issn.1006-2475.2024.12.006

• 算法设计与分析 • 上一篇    下一篇

基于自编码器的网络流量异常检测



  

  1. (1.新疆师范大学计算机科学技术学院,新疆 乌鲁木齐 830054; 2.中国科学院新疆理化技术研究所, 新疆 乌鲁木齐 830011)
  • 出版日期:2024-12-31 发布日期:2024-12-31
  • 基金资助:
    新疆维吾尔自治区自然科学基金资助项目(2023D01A46);国家重点研发计划项目(E1182101)

Anomaly Detection of Network Traffic Based on Autoencoder 

  1. (1. College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, China;
    2. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China)
  • Online:2024-12-31 Published:2024-12-31

摘要: 现有流量异常检测方案在面对日益复杂的网络流量和维度增加的数据结构时,存在误报率高、效率低以及实用性差等问题。针对这些问题,本文提出一种基于自编码器的网络流量异常检测模型。该模型首先基于随机森林算法对网络流量提取特征并筛选最优特征集,通过层次聚类算法将特征向量集划分为若干子集来降低特征维度;然后由自编码器并行处理特征子集并计算RMSE值,定义多次实验的最大平均RMSE值为正常流量阈值;利用测试数据的平均RMSE值和阈值判定异常流量。实验结果表明,本文模型召回率较传统的异常检测方法平均提高了4.3个百分点,运行时间降低了约37%。

关键词: 异常检测, 自编码器, 层次聚类, 随机森林算法

Abstract: In the face of increasingly complex network traffic and data structures with increasing dimensions, the existing traffic anomaly detection schemes have problems such as high false positive rate, low efficiency and poor practicability. To solve these problems, an autoencoder based network traffic anomaly detection model is proposed. Firstly, the model extracts the features of network traffic based on random forest algorithm and selects the optimal collection, and divides the feature vector set into several subsets by hierarchical clustering algorithm to reduce the feature dimension. Then the feature subset is processed in parallel by the autoencoder and the RMSE value is calculated. The maximum average RMSE value of multiple experiments is defined as the normal flow threshold. The average RMSE value and threshold of the test data are used to determine the abnormal traffic. The experimental results show that the recall rate of this model is 4.3 percentage points higher than that of the traditional anomaly detection method, and the running time is reduced by about 37%.

Key words: anomaly detection, autoencoder, hierarchical clustering, random forest algorithm

中图分类号: