基于卷积神经网络的敏感文件检测方法

doi:10.3969/j.issn.1006-2475.2018.07.006

计算机与现代化 ›› 2018, Vol. 0 ›› Issue (07): 28-.doi: 10.3969/j.issn.1006-2475.2018.07.006

基于卷积神经网络的敏感文件检测方法

(1.南瑞集团有限公司(国网电力科学研究院),江苏南京211106；2.国网江苏省电力公司信息通信分公司,江苏南京210024)

收稿日期:2017-10-30 出版日期:2018-08-23 发布日期:2018-08-27
作者简介:林学峰(1988-),男,江苏靖江人,南瑞集团有限公司（国网电力科学研究院）工程师,硕士,研究方向:信息安全;夏元轶(1988-),男,江苏无锡人,国网江苏省电力公司信息通信分公司工程师,硕士,研究方向:信息安全;郭金龙(1991-),男,安徽亳州人,硕士研究生,研究方向:网络安全,信息安全;于晓文（1990-）,女,江苏连云港人,工程师,硕士,研究方向:信息安全。
基金资助:
国家电网公司科技项目(SGJSXT00JFJS1700101)

SensitiveFileDetectionMethodBasedonCNN

(1.NARIGroupCorporation/StateGridElectricPowerResearchInstitute,Nanjing211106,China；
2.Information&TelecommunicationBranch,StateGridJiangsuElectricPowerCo.Ltd.,Nanjing210024,China)

Received:2017-10-30 Online:2018-08-23 Published:2018-08-27

摘要/Abstract

摘要： 近几年，电力行业信息化建设取得了巨大成就。企业办公文件、项目方案、项目合同等涉及行业秘密的文件越来越多地在互联网中传输，导致企业级敏感文件泄露。传统敏感文件识别方法基于敏感词库进行特征检测，检测速度快，但是存在较高的漏报率和误报率。本文提出一种基于深度学习的敏感文件检测方法，引入词向量及卷积神经网络算法，实现敏感文件精准分类。本文提出的识别企业级敏感文件的检测方法优点是不再依赖于特征关键字，降低了漏报率和误报率。

关键词: 敏感词库, 词向量, 卷积神经网络, 深度学习, 敏感文件检测

Abstract: Inrecentyears,thepowerindustryinformationconstructionhasmadegreatachievements.Moreandmoreofficedocuments,projectdocuments,projectcontractsandotherdocumentsinvolvingindustrysecrettransmitonInternet,onthetransmissionprocess,enterprise-classsensitivedocumentsmayhavebeenleaked.Traditionalsensitivedatarecognitionmethodbasedonsensitivelexiconforfeaturedetectioncangetdetectionresultquickly,butthereisalowaccuracy,highfalsenegativesrateandfalsepositivesrate.ThispaperproposesasensitivefiledetectionmethodbasedonDeepLearning.Themethodreferstowordembeddingandconvolutionneuralnetworkalgorithmtorealizetheaccurateclassificationofsensitivedocuments.Theapproachinthispapermakesenterprisesensitivefilesdetectionindependentoffeaturekeywords,andreducesthefalsenegativerateandfalsepositiverate.

Key words: sensitivewordtable, wordembedding, convolutionneuralnetwork, deeplearning, sensitivefiledetection

中图分类号:

TP309.2

林学峰1,夏元轶2,郭金龙1,于晓文1. 基于卷积神经网络的敏感文件检测方法[J]. 计算机与现代化, 2018, 0(07): 28-.

LINXue-feng1,XIAYuan-yi2,GUOJin-long1,YUXiao-wen1. SensitiveFileDetectionMethodBasedonCNN[J]. Computer and Modernization, 2018, 0(07): 28-.

参考文献

［1］徐建忠,罗准辰,张亮.语义扩展技术在敏感数据识别中的应用研究［J］.现代电子技术,2016,39(12):80-82.
［2］李扬,潘泉,杨涛.基于短文本情感分析的敏感信息识别［J］.西安交通大学学报,2016,50(9):80-84.
［3］于东,荀恩东.基于WordEmbedding语义相似度的字母缩略术语消歧［J］.中文信息学报,2014,28(5):51-59.
［4］蔡慧苹,王丽丹,段书凯.基于WordEmbedding和CNN的情感分类模型［J］.计算机应用研究,2016,33(10):2902-2905.
［5］KrizhevskyA,SutskeverI,HintonGE.ImageNetclassificationwithdeepconvolutionalneuralnetworks［C］//Proceedingsofthe25thInternationalConferenceonNeuralInformationProcessingSystems.2012:1097-1105.
［6］CollobertR,WestonJ,BottonL,etal.Naturallanguageprocessing(almost)fromscratch［J］.JournalofMachineLearningResearch,2011,12(1):2493-2537.
［7］ShenYelong,HeXiaodong,GaoJianfeng,etal.LearningsemanticrepresentationsusingconvolutionalneuralnetworksforWebsearch［C］//Proceedingsofthe23rdInternationalConferenceonWorldWideWeb.2014:373-374.
［8］KalchbrennerN,GrefenstetteE，BlunsomP.Aconvolutionalneuralnetworkformodellingsentences［C］//Proceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics.2014:655-665.
［9］MikolovT,SutskeverT,ChenKai,etal.Distributedrepresentationsofwordsandphrasesandtheircompositionality［C］//Proceedingsofthe26thInternationalConferenceonNeuralInformationProcessingSystems.2013:3111-3119.
［10］BengioY,DucharmeR,VincentP,etal.Aneuralprobabilisticlanguagemodel［J］.JournalofMachineLearningResearch,2003,3(6):1137-1155.
［11］MikolovT,KarafiátM,BurgetL,etal.Recurrentneuralnetworkbasedlanguagemodel［C］//Proceedingsofthe11thAnnualConferenceoftheInternationalSpeechCommunicationAssociation.2010:1045-1048.
［12］HintonGE,SalakhyurdinovRR.Reducingthedimensionalityofdatawithneuralnetworks［J］.Science,2006,313(5786):504-507.
［13］HintonGE,OsinderoS,TehYW.Afastlearningalgorithmfordeepbeliefnets［J］.NeuralComputation,2006,18(7):1527-1554.
［14］GoodfelloI,BengioY,CourvilleA.深度学习［M］.赵申剑,符天凡,李凯,等译.北京:人民邮电出版社，2017.
［15］DomingosP.终极算法［M］.黄芳萍译.北京:中信出版集团股份有限公司,2017.
［16］HarringtonP.机器学习实战［M］.李锐,李鹏,曲亚东译.北京:人民邮电出版社,2013.
［17］郑泽宇,顾思宇.TensorFlow：实战Google深度学习框架［M］.北京:电子工业出版社，2017.
［18］何宇健.Python与机器学习实战：决策树、集成学习、支持向量机与神经网络算法详解及编程实现［M］.北京:电子工业出版社,2017.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[3]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[4]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[5]	陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120.
[6]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[7]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.
[8]	史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99.
[9]	陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106.
[10]	张泽1, 张建权2, 3, 周国鹏2, 3. 基于改进YOLOv8s的摄像头模组缺陷检测[J]. 计算机与现代化, 2024, 0(09): 107-113.
[11]	程亚子1, 雷亮1, 2, 陈瀚1, 赵毅然1. 基于转置注意力的多尺度深度融合单目深度估计[J]. 计算机与现代化, 2024, 0(09): 121-126.
[12]	程萌, 李浩. 改进YOLOv5s的落叶树鸟巢检测方法[J]. 计算机与现代化, 2024, 0(08): 24-29.
[13]	王梦溪, 李峻. 老年人跌倒检测技术研究综述[J]. 计算机与现代化, 2024, 0(08): 30-36.
[14]	时现伟1, 范鑫2. 基于轻量化的视频帧场景语义分割方法[J]. 计算机与现代化, 2024, 0(08): 49-53.
[15]	徐新爱, 李钢. 基于DCGAN的课堂表情图像生成方法[J]. 计算机与现代化, 2024, 0(08): 88-91.

基于卷积神经网络的敏感文件检测方法

SensitiveFileDetectionMethodBasedonCNN

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价