Computer and Modernization ›› 2024, Vol. 0 ›› Issue (11): 34-40.doi: 10.3969/j.issn.1006-2475.2024.11.006

Previous Articles     Next Articles

Q-learning-based Algorithm for Orchestrating Security Service Function Chain

  

  1. (1. NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 210000, China;
    2. Nanjing NARI Information & Communication Technology Co., Ltd., Nanjing 210000, China;
    3. State Grid Shandong Electric Power Institute, Jinan 250003, China)
  • Online:2024-11-29 Published:2024-12-09

Abstract: With the development of technology, Internet is becoming an indispensable part of human life and network security is becoming particularly important. To ensure network security, the orchestration of dynamic security service function chains is an important research direction. However, current research on network resource mapping and orchestration algorithms for dynamic security service function chains mainly focuses on a specific type of network resource, with the main goal of optimizing a certain network resource and reducing network service latency. They overlook the balance of overall resource allocation in the network. We construct a physical network model and a security service function chain model. Considering both physical network node computing resources and link bandwidth resources while meeting user needs, the goal is to achieve the best-balanced allocation of network resources. Based on the reinforcement Q-learning algorithm, a new link arrangement reward method is proposed, and a greedy strategy is introduced to avoid falling into local optima. A typical physical network model and different numbers of security service function chains that needs to be arranged are selected and the optimal arrangement path of the security service function chain is obtained through multiple iterations. The simulation results show that the optimal arrangement of the proposed security service function chain reduces the arrangement response time by 38.5% and improves the resource allocation balance by 2.1% compared to the simulated annealing algorithm. Compared with a genetic algorithm, it reduces the response time of orchestration by 96.5% and improves the balance of resource allocation by 2.9%.

Key words: network security, security service function chain, Q-learning, greedy strategy, resource allocation

CLC Number: