计算机与现代化 ›› 2022, Vol. 0 ›› Issue (06): 122-126.

• 信息安全 • 上一篇    

一种基于改进TF-IDF的SQL注入攻击检测算法

  

  1. (1.沈阳化工大学计算机科学与技术学院,辽宁沈阳110142;2.辽宁省化工过程工业智能化技术重点实验室,辽宁沈阳110142)
  • 出版日期:2022-06-23 发布日期:2022-06-23
  • 作者简介:关慧(1976—),女,辽宁沈阳人,副教授,博士,研究方向:软件演化,软件安全性以及语义处理,E-mail: h.guan@syuct.edu.cn; 通信作者:盛靖媛(1996—),女,江苏徐州人,硕士研究生,研究方向:数据处理技术与软件服务,网络安全,E-mail: 465112146@qq.com; 曹同洲(1995—),男,江苏扬州人,硕士研究生,研究方向:数据处理技术与软件服务,信息安全,E-mail: 961054140@qq.com。
  • 基金资助:
    辽宁省教育厅高等学校基本科研项目(LJKZ0434)

A SQL Injection Attack Detection Algorithm Based on Improved TF-IDF

  1. (1. College of Computer Science and Technology, Shenyang University of Chemical Technology, Shenyang 110142, China;
    2. Key Laborotary of Industrial Intelligence Technology on Chemical Process, Liaoning Province, Shenyang 110142, China)
  • Online:2022-06-23 Published:2022-06-23

摘要: 由于传统的TF-IDF算法没有很好地分配特征词的权重,从而会出现特征提取不充分并且效率低等问题,导致结果不符合实际情况。为了解决该方法在SQL注入攻击检测时所产生的局限性,本文通过在传统的TF-IDF算法里面加入文本数量比因子和卡方统计量CHI来改进TF-IDF,能够很好地改善一些重要词汇的权重问题。通过选择不同的分类器实现SQL注入攻击的检测,从而获得不同的分类结果。实验结果表明,Boosted Decision Tree和改进的TF-IDF相结合的方法与其它同类方法相比,具有更高的准确率、召回率和F1值。此外,本文算法相较于传统的TF-IDF算法对SQL注入攻击检测的正确率、准确率、召回率、F1值均提高5%左右,具有一定的实际应用前景。

关键词: SQL注入, TF-IDF, 卡方统计量, 文本向量化

Abstract: Because the traditional TF-IDF algorithm does not allocate the weight of feature words well, there will be problems of insufficient feature extraction and low efficiency, resulting in the results not in line with the actual situation. In order to solve the limitations of this method in SQL injection attack detection, this paper improves TF-IDF by adding text quantity ratio factor and Chi statistics to the traditional TF-IDF algorithm, which can well improve the weight of some important words. The detection of SQL injection attacks is realized by selecting different classifiers, so as to obtain different classification results. The experimental results show that the combination of boosted decision tree and improved TF-IDF has higher accuracy, recall and F1 value than other similar methods. In addition, compared with the traditional TF-IDF algorithm, the correctness, accuracy, recall and F1 value of the proposed algorithm are improved by about 5%, which has a certain practical application value.

Key words: SQL injection, TF-IDF, chi-square statistics, text vectorization