Q-learning-based Algorithm for Orchestrating Security Service Function Chain

doi:10.3969/j.issn.1006-2475.2024.11.006

Abstract

Abstract: With the development of technology， Internet is becoming an indispensable part of human life and network security is becoming particularly important. To ensure network security， the orchestration of dynamic security service function chains is an important research direction. However， current research on network resource mapping and orchestration algorithms for dynamic security service function chains mainly focuses on a specific type of network resource， with the main goal of optimizing a certain network resource and reducing network service latency. They overlook the balance of overall resource allocation in the network. We construct a physical network model and a security service function chain model. Considering both physical network node computing resources and link bandwidth resources while meeting user needs， the goal is to achieve the best-balanced allocation of network resources. Based on the reinforcement Q-learning algorithm， a new link arrangement reward method is proposed， and a greedy strategy is introduced to avoid falling into local optima. A typical physical network model and different numbers of security service function chains that needs to be arranged are selected and the optimal arrangement path of the security service function chain is obtained through multiple iterations. The simulation results show that the optimal arrangement of the proposed security service function chain reduces the arrangement response time by 38.5% and improves the resource allocation balance by 2.1% compared to the simulated annealing algorithm. Compared with a genetic algorithm， it reduces the response time of orchestration by 96.5% and improves the balance of resource allocation by 2.9%.

Key words: network security, security service function chain, Q-learning, greedy strategy, resource allocation

CLC Number:

TM769

LIU Xing1, 2, GUO Liang1, 2, WANG Zhengqi1, 2, WEI Xiaogang1, 2, XU Xuefei1, 2, LIU Jing3. Q-learning-based Algorithm for Orchestrating Security Service Function Chain[J]. Computer and Modernization, 2024, 0(11): 34-40.

References

［1］郝志超，龚汉卿. 2022年全球网络空间安全动态综述［J］. 中国电子科学研究院学报， 2023，18（4）:392-396.
［2］阳勇，孟相如，康巧燕，等. 基于资源需求预测的动态服务功能链迁移方法［J］. 计算机研究与发展， 2023，60（5）:1151-1163.
［3］ MEDHAT A M， TALEB T， ELMANGOUSH A， et al. Service function chaining in next generation network: State of the art and research challenges［J］. IEEE Communications Magazine， 2017，55（2）:216-223.
［4］ WU X C， HOU K Y， LENG X， et al. State of the art and research challenges in the security technologies of network function virtualization［J］. IEEE Internet Computing， 2020，24（1）:25-35.
［5］董仕. 软件定义网络安全问题研究综述［J］. 计算机科学， 2021，48（3）:295-306.
［6］ LUO Z Y， WU C， LI Z P， et al. Scaling geo-distributed network function chains: A prediction and learining framework［J］. IEEE Journal on Selected Areas in Communications， 2019，37（8）:1838-1850.
［7］ WANG L， XIE S J， CAO C， et al. Research on security service model of software defined network［C］// 2022 6th International Symposium on Computer Science and Intelligent Control （ISCSIC）. IEEE， 2022:347-351.
［8］ TANIGUCHI A， SHINOMIYA N. A method of service function chain configuration to minimize computing and network resoures for VNF failures［C］// Proceedings of the 2021 IEEE Region 10 Conference （TENCON）. IEEE， 2021:453-458.
［9］ SANTOS H， ROSARIO D， CERQUEIRA E， et al. Multi-criteria service function chaining orchestration for multi-user virtual reality services［C］// IEEE Global Communications Conference（GLOBECOM 2022）. IEEE， 2022:6360-6365.
［10］ GIL HERRERA J， BOTERO J F. Tabu search for service function chain composition in NFV［J］. IEEE Latin America Transactions， 2021，19（1）:17-25.
［11］ YUE Y， YANG W C， ZHANG X B， et al. A dynamic QoS guarantee mechanism in NFV-enabled networks［C］// 2022 IEEE International Conference on Services Computing（SCC）. IEEE， 2022:271-273.
［12］ LI Q， WANG X， ZHAO T， et al. An improved genetic algorithm for the scheduling of virtual network functions［C］// 2019 20th Asia-Pacific Network Operations and Management Symposium（APNOMS）. IEEE， 2019. DOI: 10.23919/APNOMS.2019.8892907.
［13］ YAO J J， CHEN M J. A flexible deployment scheme for virtual network function based on reinforcement learning［C］// 2020 IEEE 6th International Conference on Computer and Communications（ICCC）. IEEE， 2020:1505-1510.
［14］ LIU Y H， XU Z Q， YANG F， et al. Node-resource- and user-demand-aware resource allocation in NFV-enabled elastic optical networks［C］// 2021 IEEE International Conference on Communications Workshops（ICC Workshops）. IEEE， 2021. DOI: 10.1109/ICCWorkshops50388.2021.94
73602.
［15］ BAGAA M， TALEB T， BERNABE J B， et al. QoS and resource aware security orchestration system［C］// IEEE Global Communications Conference（GLOBECOM 2020）. IEEE， 2020. DOI： 10.1109/GLOBECOM42002.2020.9348217.
［16］姚晓辉，李青，孙焜焜. 一种域适配混合遗传算法及在安全服务链编排中的验证［J］. 电信科学， 2020，36（5）:16-24.
［17］ LUKOVSZKI T， ROST M， SCHMID S. It’s a match!: Near-optimal and incremental middlebox deployment［J］. ACM SIGCOMM Computer Communication Review， 2016，46（1）:30-36.
［18］徐玉伟，赵宝康，时向泉，等. 容器化安全服务功能链低延迟优化编排研究［J］. 信息网络安全， 2020，20（7）:11-18.
［19］严炜，龙长江，李善军. 基于差分量子退火算法的农用无人机路径规划方法［J］. 华中农业大学学报， 2020，39（1）:180-186.
［20］赵炳巍，贾峰，曹岩，等. 基于模拟退火算法的人工势场法路径规划研究［J］. 计算机工程与科学， 2022，44（4）:746-752.
［21］李少波，宋启松，李志昂，等. 遗传算法在机器人路径规划中的研究综述［J］. 科学技术与工程， 2020，20（2）:423-431.
［22］金月，张旭东. 深度多智能体强化学习综述［C］// 第十六届全国信号和智能信息处理与应用学术会议论文集. 中国高科技产业化研究会， 2022:104-109.
［23］张有兵，林一航，黄冠弘，等. 深度强化学习在微电网系统调控中的应用综述［J］. 电网技术， 2023，47（7）:2774-2788.
［24］ HU J Y， NIU H L， CARRASCO J， et al. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning［J］. IEEE Transactions on Vehicular Technology， 2020，69（12）:14413-14423．
［25］闫冬，彭国政，高海龙，等. 基于深度强化学习组合优化的配电网拓扑控制研究［J］. 电网技术， 2022，46（7）:2547-2554.
［26］ SHI J F， YANG H S， PAN C S， et al. Low-latency design for satellite assisted wireless VR networks［J］. IEEE Communications Letters， 2023，27（6）:1555-1559.
［27］ LIU Y， LI X， XU W， et al. Polyhedron-protection-oriented routing and resource allocation using Q-Learning in optical networks［C］// 2022 Asia Communications and Photonics Conference（ACP）. IEEE， 2022:1097-1101.
［28］ WANG S M， CHAI X Y， SONG X Q， et al. Deep Q-learning enabled wireless resource allocation for 5G network based vehicle-to-vehicle communications［C］// 2021 IEEE 6th International Conference on Signal and Image Processing（ICSIP）. IEEE， 2021:903-907.
［29］ BRYANT N B， CHUNG K K， FENG J， et al. Q-learning based routing in optical networks［C］// 2022 IEEE Canadian Conference on Electrical and Computer Engineering（CCECE）. IEEE， 2022:419-422.
［30］ SUKNUM S， THOASIRI C， JINAPORN N. Q-learning-based resource allocation in heterogeneous cellular networks［C］// 2022 International Electrical Engineering Congress（iEECON）. IEEE， 2022. DOI: 10.1109/iEECON53204.
2022.9741639.
［31］ YU L N， ZHANG C R， JIANG J Y， et al. Reinforcement learning approach for resource allocation in humanitarian logistics［J］. Expert Systems with Applications，2021，173. DOI: 10.1016/j.eswa.2021.114663.
［32］ ZHANG X C， ZHANG L H， ZHOU Q C， et al. Greedy strategies with multiobjective optimization for investment portfolio problem modeling［J］. Computational Intelligence and Neuroscience， 2022，2022. DOI: 10.1155/2022/4862772.
［33］李子怡，胡祥涛，张勇乐，等. 基于虚拟目标制导的自适应Q学习路径规划算法［J/OL］. 计算机集成制造系统：1-30（2023-09-14）［2023-10-12］.
https://doi.org/10.13196/j.cims.2022.0733.

[1]	ZHU Lingheng1, 2, GU Danpeng1, 2, TANG Songqiang1, 2, CHEN Xiaoyong1, 2. Algorithm for Layered Bipartite Graph Maximum Matching Problem [J]. Computer and Modernization, 2024, 0(06): 59-63.
[2]	WANG Zhen-ting, CHEN Yong-fu, LIU Tian. Multi-robot Scheduling Method in Intelligent Warehouse [J]. Computer and Modernization, 2020, 0(07): 65-70.
[3]	HU Yu, LIU Mei-ling, ZHOU Zi-ang, ZHANG Min. Single Intersection Traffic Signal Coordination Control Based on Q-learning [J]. Computer and Modernization, 2020, 0(05): 96-.
[4]	JING Dong-sheng1， YANG Yu1， XUE Jing-song1， ZHU Fei2， WU Wen2. A Defense Policy Learning Algorithm for Power Information Networks Based on Optimal Initial Value Q-learning [J]. Computer and Modernization, 2018, 0(11): 18-.
[5]	WANG Yue-juan1, ZHANG Su-ning1, WU Shui-ming1, ZHU Fei2. A Rank-based Q-routing Algorithm [J]. Computer and Modernization, 2018, 0(10): 1-.
[6]	FANG Jun,YAN Wen-jun, DENG Xiang-yang, LING Qing. Air Bat Strategies of CGF Based on Q-learning and Behavior Tree [J]. Computer and Modernization, 2017, 0(5): 37-39，44.
[7]	HU Jian. Routing Protocol for Wireless Sensor Networks Based on Q-Learning [J]. Computer and Modernization, 2013, 1(3): 131-134.