融合注意力机制与并行混合网络的DGA域名检测

摘要/Abstract

摘要： 基于统计特征的DGA域名检测方法依赖复杂的特征工程，而现有端到端的深度学习方法在DGA域名家族的多分类任务中性能表现不佳。针对上述问题，提出一种融合注意力机制与并行混合网络的DGA域名检测方法。首先，引入深层金字塔卷积神经网络，提取域名深层语义信息，并使用通道注意力块SENet进行改进构建DPCNN-SE，自适应学习通道间关系，抑制无用特征的传递；同时，将自注意力机制与双向长短时记忆网络结合构建BiLSTM-SA网络，捕获域名数据中最具代表性的全局时序特征；最后，融合2个网络提取的特征，输入softmax层输出分类结果。实验结果表明，该方法在域名家族的多分类任务中相比CNN、LSTM的单一模型，F1值分别提高了10.30个百分点、10.18个百分点；相较于现有的混合网络方法Bilbo和BiGRU-MCNN，F1值分别提高了5.97个百分点、4.87个百分点，并且具有更低的计算复杂度。

关键词: DGA域名检测, 特征融合, 端到端, 长短记忆神经网络, 卷积神经网络

Abstract: Statistical feature-based DGA domain name detection methods relies on complex feature engineering， while the existing end-to-end deep learning methods perform poorly in the multi-classification tasks. To address these problems， a DGA domain name detection method combining attention mechanisms and parallel hybrid networks is proposed. Firstly, deep pyramid convolutional neural networks is introduced to extract deep semantic information of domain names, and DPCNN-SE is proposed by improving DPCNN using the channel attention block called SENet, which can learn inter-channel relationships adaptively and suppress the transmission of useless features. Meanwhile, the self-attention mechanism and the bidirectional long short-term memory network are combined to construct the BiLSTM-SA network to capture the most representative global temporal features in domain name data. Finally, the features extracted by the two networks are fused and fed into the softmax layer to output the classification results. The experimental results show that the method increases the F1-score by 10.30 percentage points and 10.18 percentage points in the multi-classification task of domain name family compared with the single model of CNN and LSTM， respectively; the F1-score increases by 5.97 percentage points and 4.87 percentage points， respectively， compared with the existing hybrid model method Bilbo and BiGRU-MCNN， and has lower computational complexity.

Key words: DGA domain name detection, feature fusion, end-to-end, long short-term memory neural network, convolutional neural network

刘立婷, 欧毓毅. 融合注意力机制与并行混合网络的DGA域名检测[J]. 计算机与现代化, 2022, 0(09): 119-126.

LIU Li-ting, OU Yu-yi. DGA Domain Name Detection Combining Attention Mechanisms and Parallel Hybrid Network[J]. Computer and Modernization, 2022, 0(09): 119-126.

参考文献

［1］ ZHAUNIAROVICH Y, KHALIL I, YU T, et al. A survey on malicious domains detection through DNS data analysis［J］. ACM Computing Surveys, 2018,51（4）:1-36
［2］ KHARRAZ A, ROBERTSON W, BALIAROTTI D, et al. Cutting the gordian knot: A look under the hood of ransomware attacks［C］// Springer International Publishing. 2016:3-24.
［3］ PATSAKIS C, CASINO F, KATOS V. Encrypted and covert DNS queries for botnets: Challenges and countermeasures［J］. Computers & Security, 2020,88:101614.
［4］ MARK WARD. Cryptolocker Victims to Get Files Back for Free［EB/OL］. （2014-08-06）［2021-12-05］. https://www.bbc.com/news/techn-ology-28661463．
［5］ WIKIBOOKS. Conficker［EB/OL］. ［2021-12-05］. https://en.wikipedia.org/wiki/Conficker.
［6］ BADER J. The DGA of Ramnit［EB/OL］. ［2021-12-05］. https://johannesbader.ch/2014/12/the-dga-of-ramnit/.
［7］沙泓州,刘庆云,柳厅文,等. 恶意网页识别研究综述［J］. 计算机学报, 2016,39（3）:529-542.
［8］ YADAV S, REDDY A K K, REDDY A L N, et al. Detecting algorithmically generated domain-flux attacks With DNS traffic analysis［J］. IEEE/ACM Transactions on Networking, 2012,20（5）:1663-1677.〖HJ1mm〗
［9］张维维,龚俭,刘茜,等. 基于词素特征的轻量级域名检测算法［J］. 软件学报, 2016,27（9）:2348-2364.
［10］王媛媛,吴春江,刘启和,等. 恶意域名检测研究与应用综述［J］. 计算机应用与软件, 2019,36（9）:310-316.
［11］YANG L H, ZHAI J T, LIU W W, et al. Detecting word-based algorithmically generated domains using semantic analysis［J］. Symmetry, 2019,11（2）:176.
［12］SELVI J, RODRGUEZ R J. SORIA-OLIVAS E. Detection of algorithmically generated malicious domain names using masked N-grams［J］. Expert Systems with Applications, 2019,124:156-163.
［13］TONG V, NGUYEN G. A method for detecting DGA botnet based on semantic and cluster analysis［C］// Proceedings of the 7th Symposium on Information and Communication Technology. 2016:272-277.
［14］WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks［J］. arXiv preprint arXiv:1611.00791, 2016.
［15］QIAO Y C, ZHANG B, ZHANG W Z, et al. DGA domain name classification method based on long short-term memory with attention mechanism［J］. Applied Sciences, 2019,9（20）:4205.
［16］YANG L H, LIU G J, WANG J W, et al. Fast3DS: A real-time full-convolutional malicious domain name detection system［J］. Journal of Information Security and Applications, 2021,61:102933.
［17］HIGHNAM K, PUZIO D, LUO S, et al. Real-time detection of dictionary dga network traffic using deep learning［J］. SN Computer Science, 2021,2（2）:1-17.
［18］CHEN Y J, PANG B, SHAO G L, et al. DGA-based botnet detection toward imbalanced multiclass learning［J］. Tsinghua Science and Technology, 2021,26（4）:387-402.
［19］王志强,李舒豪,池亚平,等. 基于深度学习的恶意DGA域名检测［J］. 计算机工程与设计, 2021,42（3）:601-606.
［20］CHEN C Q, PAN L L, XIE X L. DGA domain name detection based on BiGRU-MCNN［C］// Proceedings of the 2019 4th International Conference on Intelligent Information Processing. 2019:315-319.
［21］杜鹏,丁世飞. 基于混合词向量深度学习模型的DGA域名检测方法［J］. 计算机研究与发展, 2020,57（2）:433-446.
［22］HU J, SHEN L, SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. 2018:7132-7141.
［23］JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017:562-570.
［24］HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks［C］// European Conference on Computer Vision. 2016:630-645.
［25］SAXE J, BERLIN K. EXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs， file paths and registry keys［J］. arXiv preprint arXiv:1702.08568, 2017.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	张思敏, 刘新妹, 殷俊龄, 李宝玲. 基于YOLOv7改进的PCB缺陷检测方法[J]. 计算机与现代化, 2024, 0(12): 45-52.
[3]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[4]	王海洋, 弓同鑫, 杨锦涛, 陈再龙. 多尺度时间编码的工业园区短期负荷预测[J]. 计算机与现代化, 2024, 0(12): 59-65.
[5]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[6]	马钰, 杨勇, 任鸽, 帕力旦·吐尔逊. 基于GCN和微调BERT的作文自动评分方法[J]. 计算机与现代化, 2024, 0(09): 33-37.
[7]	陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106.
[8]	郑尚坡1, 陈德富1, 李坚利2, 林国贤2, 王星平3. 基于改进YOLOv5s和DeepSORT的行人跟踪算法[J]. 计算机与现代化, 2024, 0(08): 54-58.
[9]	高帅鹏, 王怡凡. 基于图像的群体情绪识别综述[J]. 计算机与现代化, 2024, 0(08): 98-107.
[10]	周宪溪, 牟莉. 基于改进TF-IDF和AGLCNN的新闻长文本分类模型[J]. 计算机与现代化, 2024, 0(08): 120-126.
[11]	杨江1, 孙晓梅1, 许韬2. 基于业务内容构建股票关联关系的股价预测[J]. 计算机与现代化, 2024, 0(07): 21-25.
[12]	刘存莉1, 雷占占2, 郑澳2. 基于循环卷积神经网络的排水管网缺陷检测方法[J]. 计算机与现代化, 2024, 0(07): 26-35.
[13]	庞梅, 汪珙, 詹泳, 黄哲法. 基于YOLOv5改进算法的海洋水下垃圾检测方法[J]. 计算机与现代化, 2024, 0(07): 120-126.
[14]	符灵利, 邱宇, 张新晨 . 基于改进U-Net多特征融合的血管分割#br#[J]. 计算机与现代化, 2024, 0(06): 76-82.
[15]	朱纷, 何立风, 孙爽, 张梦颖, 于佳佳. 基于形变残差和级联编码的胰腺分割模型[J]. 计算机与现代化, 2024, 0(06): 83-88.