Computer and Modernization

Previous Articles     Next Articles

A Defense Policy Learning Algorithm for Power Information Networks Based on Optimal Initial Value Q-learning

  

  1. (1. Suzhou Power Supply Branch, State Grid Jiangsu Electric Power Limited Company, Suzhou 215004, China;
    2. School of Computer Science and Technology, Soochow University, Suzhou 215006, China)
  • Received:2018-04-26 Online:2018-11-22 Published:2018-11-23

Abstract:  Maintaining the security and stability of the power information network is an important guarantee for today’s social development. With the development of the power information network, the researchers now focus on how to establish an efficient and stable power information protection network. The defense strategy used in an automated power information network system used to have problems such as slow update speed, long update cycle, inability to update automatically, and uneven resource allocation. The paper proposed a power information network defense algorithm based on optimal initial value Q learning. The method uses the classical reinforcement learning algorithm. Defensive strategy is obtained through simulated confrontation. Defensive agent uses Q-learning algorithm in order to utilize the historical experience. The optimistic initial values could greatly accelerate the training speed of the system’s defensive performance. The experiment verifies the effectiveness of the algorithm.

Key words: power information network, optimal initial values, Q-learning, network defense

CLC Number: