A Defense Policy Learning Algorithm for Power Information Networks Based on Optimal Initial Value Q-learning

doi:10.3969/j.issn.1006-2475.2018.11.004

Abstract

Abstract: Maintaining the security and stability of the power information network is an important guarantee for today’s social development. With the development of the power information network, the researchers now focus on how to establish an efficient and stable power information protection network. The defense strategy used in an automated power information network system used to have problems such as slow update speed, long update cycle, inability to update automatically, and uneven resource allocation. The paper proposed a power information network defense algorithm based on optimal initial value Q learning. The method uses the classical reinforcement learning algorithm. Defensive strategy is obtained through simulated confrontation. Defensive agent uses Q-learning algorithm in order to utilize the historical experience. The optimistic initial values could greatly accelerate the training speed of the system’s defensive performance. The experiment verifies the effectiveness of the algorithm.

Key words: power information network, optimal initial values, Q-learning, network defense

CLC Number:

TP393

JING Dong-sheng1， YANG Yu1， XUE Jing-song1， ZHU Fei2， WU Wen2. A Defense Policy Learning Algorithm for Power Information Networks Based on Optimal Initial Value Q-learning[J]. Computer and Modernization, 2018, 0(11): 18-.

References

［1］薛禹胜,赖业宁. 大能源思维与大数据思维的融合(一)大数据与电力大数据［J］. 电力系统自动化, 2016,40(1):1-8.
［2］余贻鑫,刘艳丽. 智能电网的挑战性问题［J］. 电力系统自动化, 2015，39(2):1-5.
［3］汤奕,陈倩,李梦雅,等. 电力信息物理融合系统环境中的网络攻击研究综述［J］. 电力系统自动化, 2016,40(17):59-69.
［4］王栋,陈传鹏,颜佳,等. 新一代电力信息网络安全架构的思考［J］. 电力系统自动化, 2016,40(2):6-11.
［5］靳丹,马志程,杨鹏,等. 电力信息系统动态风险评估方法研究［J］. 现代电子技术, 2016,39(14):162-165.
［6］张振安,黄少伟,梁易乐,等. 基于主从博弈的交直流混联系统主动防御策略设计［J］. 电工电能新技术, 2015,34(10):10-16.
［7］黄天恩,孙宏斌,郭庆来,等. 基于电网运行大数据的在线分布式安全特征选择［J］. 电力系统自动化, 2016，40(4):32-40.
［8］ ANWAR A, MAHMOOD A N. Anomaly detection in electric network database of smart grid: Graph matching approach［J］. Electric Power Systems Research, 2016,133:51-62.
［9］金鑫,李龙威,苏国华,等. 基于Spark框架和PSO优化算法的电力通信网络安全态势预测［J］. 计算机科学, 2017,44(s1):366-371.
［10］ZHU F, LIU Q, FU Y C, et al. Segmentation of neuronal structures using SARSA (λ)-based boundary amendment with reinforced gradient-descent curve shape fitting［J］. PLoS One, 2014,9(3):1-19.
［11］秦蕊,曾帅,李娟娟,等. 基于深度强化学习的平行企业资源计划［J］. 自动化学报, 2017,43(9):1588-1596.
［12］朱斐,朱海军,刘全,等. 一种解决连续空间问题的真实在线自然梯度AC算法［J］. 软件学报, 2018,29(2):267-282.
［13］SUTTON R S, BARTO A G. Reinforcement learning: An introduction［J］. IEEE Transactions on Neural Networks, 2005,16(1):285-286.
［14］BUSONIU L, BABUSKA R, SCHUTTER B D, et al. Reinforcement Learning and Dynamic Programming Using Function Approximators［M］. CRC Press, 2010.
［15］WIERING M, OTTERLO M V. Reinforcement Learning［M］. Springer Berlin Heidelberg, 2012.
［16］肖峻,甄国栋,祖国强,等. 配电网安全域法的改进及与N-1仿真法的对比验证［J］. 电力系统自动化, 2016,40(8):57-63.
［17］何耀,周聪,郑凌月,等. 基于扩展卡尔曼滤波的虚假数据攻击检测方法［J］. 中国电力, 2017,50(10):35-40.
［18］陈小军,时金桥,徐菲,等. 面向内部威胁的最优安全策略算法研究［J］. 计算机研究与发展, 2014,51(7):1565-1577.
［19］陈学通,凌超,薛峰,等. 一种基于贪心算法的紧急控制策略优化搜索方法［J］. 电力系统保护与控制, 2017,45(23):74-81.
［20］AUER P, CESA-BIANCHI N, FREUND Y, et al. The non-stochastic multi-armed bandit problem［J］. Siam Journal on Computing, 2011,32(1):48-77.

[1]	LIU Xing1, 2, GUO Liang1, 2, WANG Zhengqi1, 2, WEI Xiaogang1, 2, XU Xuefei1, 2, LIU Jing3. Q-learning-based Algorithm for Orchestrating Security Service Function Chain [J]. Computer and Modernization, 2024, 0(11): 34-40.
[2]	WU Shui-ming, JI Zhi-yuan, WANG Zhen-yu, JING Dong-sheng. Power Information Network Intrusion Detection Algorithm Based on Dueling-DDQN [J]. Computer and Modernization, 2021, 0(12): 43-47.
[3]	LIU Yi-hao. A Method to Generate Features of Mimicry Honeypot Based on Generative Adversarial Networks [J]. Computer and Modernization, 2021, 0(07): 120-126.
[4]	WANG Zhen-ting, CHEN Yong-fu, LIU Tian. Multi-robot Scheduling Method in Intelligent Warehouse [J]. Computer and Modernization, 2020, 0(07): 65-70.
[5]	HU Yu, LIU Mei-ling, ZHOU Zi-ang, ZHANG Min. Single Intersection Traffic Signal Coordination Control Based on Q-learning [J]. Computer and Modernization, 2020, 0(05): 96-.
[6]	WANG Yue-juan1, ZHANG Su-ning1, WU Shui-ming1, ZHU Fei2. A Rank-based Q-routing Algorithm [J]. Computer and Modernization, 2018, 0(10): 1-.
[7]	FANG Jun,YAN Wen-jun, DENG Xiang-yang, LING Qing. Air Bat Strategies of CGF Based on Q-learning and Behavior Tree [J]. Computer and Modernization, 2017, 0(5): 37-39，44.
[8]	HU Jian. Routing Protocol for Wireless Sensor Networks Based on Q-Learning [J]. Computer and Modernization, 2013, 1(3): 131-134.