计算机与现代化

• 人工智能 • 上一篇    下一篇

基于biRNN的海军军械不均衡文本数据集处理方法

  

  1. (1.海军航空大学,山东烟台264001; 2.海军92665部队,湖南张家界427000)
  • 收稿日期:2019-04-23 出版日期:2019-12-11 发布日期:2019-12-11
  • 作者简介:齐玉东(1973-),男,河南睢县人,副教授,博士,研究方向:信息安全,人工智能,E-mail: 1015817631@qq.com; 丁海强(1994-),男,山东青州人,硕士研究生,研究方向:网络作战,人工智能,E-mail: 17686048082@163.com; 赵锦超(1995-),男,山东青州人,助理工程师,本科,研究方向:信息安全; 孙明玮(1993-),男,山东烟台人,硕士研究生,研究方向:信息安全,人工智能。

biRNN-based Method for Processing Unbalanced Text Data Sets of Naval Ordnance

  1. (1. Naval Aeronautical University, Yantai 264001, China; 2. Naval 92665th Troop, Zhangjiajie 427000, China)
  • Received:2019-04-23 Online:2019-12-11 Published:2019-12-11

摘要: 传统的不均衡数据集处理方法存在人工设置特征繁琐、普适性差等缺陷,难以适用于海军军械不均衡文本数据集处理。针对此问题,本文提出一种基于biRNN模型的海军军械不均衡文本数据集处理方法。通过biRNN模型自动学习文本序列特征,以双向文本序列预测方式扩展少数类文本,达到文本数据均衡目的,并在均衡数据集的基础上将整个文本数据集进行扩充。分别对原始数据集、均衡数据集、扩充数据集进行文本分类实验,实验结果表明,基于biRNN的不均衡数据集扩展方法对原始数据集进行均衡、扩展处理能够有效提高文本分类的性能。

关键词: 深度学习, 海军军械, 不均衡数据集, 双向循环神经网络, 文本数据挖掘

Abstract: Traditional unbalanced data sets processing methods are characterized by complicated artificial settings and poor universality, which are difficult to be applied to naval ordnance unbalanced text data sets processing. Aiming at this problem, this paper proposes a method of processing unbalanced text data sets of naval ordnance based on biRNN model. The biRNN model is used to automatically learn the features of text sequences and expand a few types of texts by two-way text sequence prediction to achieve the goal of text data balancing. The whole text data set is expanded on the basis of balanced data set. Text classification experiments are carried out on the original data set, the balanced data set and the extended data set. The experimental results show that the unbalanced data set expansion method based on biRNN can effectively improve the performance of text classification by balancing and extending the original data set.

Key words: deep learning, naval ordnance, unbalanced data set, bidirectional recurrent neural network, text data mining

中图分类号: