计算机与现代化 ›› 2022, Vol. 0 ›› Issue (08): 99-105.

• 信息安全 • 上一篇    下一篇

基于混合N-Gram模型和XGBoost算法的内部威胁检测方法

  

  1. (1.江西省科技基础条件平台中心,江西南昌330003;2.中国广电江西网络有限公司,江西南昌330006)

  • 出版日期:2022-08-22 发布日期:2022-08-22
  • 作者简介:孙丹(1988—),女,江西泰和人,工程师,硕士,研究方向:信息系统安全,E-mail: 807965684@qq.com。
  • 基金资助:
    江西省科技计划项目(20194BBE50087); 江西省重点研发计划项目(20202BBEL53003)

Insider Threat Detection Based on Hybrid N-Gram and XGBoost Theory

  1. (1. Jiangxi Science & Technology Infrastructure Center, Nanchang 330003, China;
    2. China Radio and Television jiangxi Network Co.Ltd., Nanchang 330006, China)

  • Online:2022-08-22 Published:2022-08-22

摘要: 随着政府企事业单位网络安全机制的建立健全,单纯从外部进入目标系统的攻击门槛越来越高,导致内部威胁逐渐增多。内部威胁区别于外部威胁,攻击者主要来自于内部用户,使得攻击更具隐蔽性,更难被检测。本文提出一种基于混合N-Gram模型和XGBoost算法的内部威胁检测方法。采用词袋、N-Gram、词汇表3种特征提取方法进行实验比对及参数N值筛选,基于混合N-Gram模型和XGBoost算法的内部威胁检测方法检测效果比通过1维数据、2维数据、4维数据的不同特征进行组合的特征子集效果更优,特定度达到0.23,灵敏度达到27.65,准确度达到0.94,F1值达到0.97。对比特定度、灵敏度、准确度、F1值4项评价指标,基于混合N-gram特征提取方法比传统的词袋、词汇表特征提取方法在检测中更有效。此检测方法不仅提高了内部威胁检测特征码的区分度,同时提高了特征提取的准确性和计算性能。

关键词: 混合N-Gram模型, XGBoost算法, 内部威胁, SEA数据集, 评价指标

Abstract: With the establishment and improvement of the network security mechanism of the government、 the enterprises and the Institutions, the threshold for attacking the target system from the outside is getting higher and higher. So the insider threats are gradually increasing. The internal threats are different from external threats. The attackers are mainly from internal users, so it makes the attacks more concealed and harder to be detected. The paper first analyzes user behaviors in the public SEA data set, then proposes an insider threat detection based on hybrid N-Gram and XGBoost theory, using the big data and machine learning methods. Three feature extraction methods: bag-of-words, N-Gram, and vocabulary are used for experimental comparison and N value experimental screening. The internal threat detection method based on the hybrid N-Gram model and XGBoost algorithm has a better detection effect than one-dimensional data and two-dimensional data. The effect of combining the different features of the four-dimensional data on the feature subset is better. The specificity reaches 0.23, the sensitivity reaches 27.65, the accuracy reaches 0.94, and the F1 value reaches 0.97. Comparing the 4 evaluation indicators of specificity, sensitivity, accuracy, and F1 value, the feature extraction method based on hybrid N-gram is more effective in detection than traditional bag-of-words and vocabulary feature extraction methods. This detection method not only improves the discrimination of internal threat detection signatures, but also improves the accuracy of feature extraction and calculation performance.

Key words: hybrid N-Gram, XGBoost, internal threats, SEA, evaluation index