计算机与现代化

• 信息安全 • 上一篇    下一篇

GBDT与LR融合模型在加密流量识别中的应用

  

  1. (华北电力大学(北京)控制与计算机工程学院,北京102206)
  • 收稿日期:2019-08-12 出版日期:2020-03-24 发布日期:2020-03-30
  • 作者简介:王垚(1996-),男,四川南充人,硕士研究生,研究方向:网络信息安全,E-mail: 973933471@qq.com; 李为(1967-),女,辽宁沈阳人,教授,硕士,研究方向:智能电网软件技术,电力信息安全; 吴克河(1962-),男,江苏镇江人,教授,博士,研究方向:智能电网软件技术,电力信息安全; 崔文超(1983-),男,河南南阳人,讲师,博士,研究方向:电力信息安全。
  • 基金资助:
    国家电网公司科技项目(521304190004)

Application of Fusion Model of GBDT and LR in Encrypted Traffic Identification

  1. (School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China)
  • Received:2019-08-12 Online:2020-03-24 Published:2020-03-30

摘要: 随着网络应用服务类型的多样化以及网络流量加密技术的不断发展,加密流量识别已经成为网络安全领域的一个重大挑战。传统的流量识别技术如深度包检测无法有效地识别加密流量,而基于机器学习理论的加密流量识别技术则表现出很好的效果。因此,本文提出一种融合梯度提升决策树算法(GBDT)与逻辑回归(LR)算法的加密流量分类模型,使用贝叶斯优化(BO)算法进行超参数调整,利用与时间相关的流特征对普通加密流量与VPN加密流量进行识别,实现了整体高于90%的流量识别准确度,与其他常用分类模型相比拥有更好的识别效果。

关键词: 加密流量识别, 梯度提升决策树, 逻辑回归, 流特征, 贝叶斯优化

Abstract: With the diversification of network application service types and the continuous development of traffic encryption technology, encrypted traffic identification has become a major challenge in the field of network security. Traditional traffic identification techniques, such as deep packet inspection, cannot effectively identify encrypted traffic, while the identification technology based on machine learning theory has shown good results. For this, an optimized encrypted traffic classification model based on the fusion of GBDT and LR is proposed, in which Bayesian optimization (BO) algorithm is used for hyperparameter tuning. By using the time-related flow features to identify common encrypted traffic and VPN encrypted traffic, it obtains an overall accuracy more than 90%, which gets better recognition effect than other common classification models.

Key words: encrypted traffic identification, GBDT (Gradient Boosting Decision Tree), LR (Logistic Regression), flow features, Bayesian optimization

中图分类号: