Graph Neural Network-based Multi-agent Reinforcement Learning for Adversarial Policy Detection Algorithm

doi:10.3969/j.issn.1006-2475.2025.04.007

Abstract

Abstract: In a multi-agent environment， the reinforcement learning model has security vulnerabilities in coping with adversarial attacks and is susceptible to adversarial attacks， of which adversarial policy-based adversarial attacks are more difficult to defend against because they do not directly modify the victim’s observations. To solve this problem， this paper proposes a graph neural network-based adversarial policy detection algorithm， which aims to effectively identify malicious behaviors among agents. This paper detects adversarial policy by training the graph neural network as an adversarial policy detector by employing alternative adversarial policies during the collaboration process of the agents， and calculates the trust scores of the other agent based on the local observations of the agents. The detection method in this paper provides two levels of granularity； adversarial detection at the game level detects adversarial policies with very high accuracy， and time-step level adversarial detection allows for adversarial detection at the early stage of the game and timely detection of adversarial attacks. This paper conducts a series of experiments on the StarCraft platform， whose results show that the detection method proposed in this paper can achieve an AUC value of up to 1.0 in detecting the most advanced adversarial policy-based adversarial attacks， which is better than the state-of-the-art detection methods. The detection method in this paper can detect adversarial policy faster than existing methods， and can detect the adversarial attack at the 5th time step at the earliest. Applying this paper’s detection method to adversarial defense improves the win rate of the attacked game by up to 61 percentage points. In addition experimental results show that the algorithm in this paper is highly generalizable and the detection method in this paper does not need to be trained again and can be directly used to detect observation-based adversarial attacks. Therefore， the method proposed in this paper provides an effective adversarial attack detection mechanism for reinforcement learning models in a multi-agent environment.

Key words: , reinforcement learning, multi-agent system, adversarial attack, adversarial detection, graph neural network

CLC Number:

TP391

SUN Qining1, GUI Zhiming1, LIU Yanfang2, FAN Xinxin3, LU Yunfeng4. Graph Neural Network-based Multi-agent Reinforcement Learning for Adversarial Policy Detection Algorithm[J]. Computer and Modernization, 2025, 0(04): 42-49.

References

［1］ GUPTA J K， EGOROV M， KOCHENDERFER M. Cooperative multi-agent control using deep reinforcement learning［C］// Autonomous Agents and Multiagent Systems： AAMAS 2017 Workshops. Springer， 2017：66-83.
［2］ LI T X， ZHU K， LUONG N C， et al. Applications of multi-agent reinforcement learning in future Internet： A comprehensive survey［J］. IEEE Communications Surveys & Tutorials， 2022，24（2）：1240-1279.
［3］ CANESE L， CARDARILLI G C， DI NUNZIO L D， et al. Multi-agent reinforcement learning： A review of challenges and applications［J］. Applied Sciences， 2021，11（11）. DOI: 10.3390/app11114948.
［4］ LI K. Novel multi-agent reinforcement learning for maximizing throughput in UAV-Enabled 5G networks［J］. Wireless Networks， 2023，30：7029-7040.
［5］ GLEAVE A， DENNIS M， WILD C， et al. Adversarial policies： Attacking deep reinforcement learning［J］. arXiv preprint arXiv：1905.10615， 2019.
［6］ SAMVELYAN M， RASHID T， DE WITT C S， et al. The starcraft multi-agent challenge［J］. arXiv preprint arXiv：1902.04043， 2019.
［7］ GUO W B， WU X， HUANG S， et al. Adversarial policy learning in two-player competitive games［C］// International Conference on Machine Learning. PMLR， 2021： 3910-3919.
［8］ ZHANG H M， SUN K， XU B， et al. A simple unified framework for anomaly detection in deep reinforcement learning［J］. arXiv preprint arXiv：2109.09889， 2021.
［9］ SEDLMEIER A， MULLER R， ILLIUM S， et al. Policy entropy for out-of-distribution classification［C］// Artificial Neural Networks and Machine Learning-ICANN. Springer， 2020：420-431.
［10］ LI S M， GUO J， XIU J Q， et al. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence［J］. arXiv preprint arXiv：2302.03322， 2023.
［11］ SILVER D， HUANG A， MADDISION C J， et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature， 2016，529（7587）：484-489.
［12］ GARCIA F， RACHELSON E. Markov decision processes［M］// Markov Decision Processes in Artificial Intelligence. WILEY， 2013：1-38.
［13］ SHEN G C， YANG W. Review on Dec-POMDP model for ARL algorithms［C］// Proceedings of 4th International Conference on Wireless Communications and Applications（ICWCA 2020）. Springer ， 2022：29-35.
［14］ TAMPUU A， MATIISEN T， KODELJA D， et al. Multiagent cooperation and competition with deep reinforcement learning［J］. PlOS One， 2017，12（4）. DOI： 10.1371/journal.pone.0172395.
［15］ SUNEHAG P， LEVER G， GRUSLYS A， et al. Value-decomposition networks for cooperative multi-agent learning［J］. arXiv preprint arXiv：1706.05296， 2017.
［16］ RASHID T， FARQUHAR G， PENG B， et al. Weighted QMIX： Expanding monotonic value function factorisation for deep multi-agent reinforcement learning［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. ACM， 2020：10199-10210.
［17］ FOERSTER J， FARQUHAR G， AFOURAS T， et al. Counterfactual multi-agent policy gradients［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI， 2018，32（1）：2974-2982.
［18］ SON K， KIM D， KANG W J， et al. Qtran： Learning to factorize with transformation for cooperative multi-agent reinforcement learning［C］// 36th International Conference on Machine Learning. PMLR， 2019：5887-5896.
［19］ WANG J H， REN Z Z， LIU T， et al. QPLEX： Duplex dueling multi-agent q-learning［J］. arXiv preprint arXiv：2008.01062， 2020.
［20］ PENG B， RASHID T， DEWITT C A S， et al. FACMAC： Factored multi-agent centralised policy gradients［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. ACM， 2021：12208-12221.
［21］ LOWE R， WU Y， TAMAR A， et al. Multi-agent actor-critic for mixed cooperative-competitive environments［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems， 2017：6382-6393.
［22］ DE WITT C S， GUPTA T， MAKOVIICHUK D， et al. Is independent learning all you need in the starcraft multi-agent challenge?［J］. arXiv preprint arXiv：2011.09533， 2020.
［23］ YU C， VELU A， VINITSKY E， et al. The surprising effectiveness of PPO in cooperative multi-agent games［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. ACM， 2022：24611-24624.
［24］ GOODFELLOW I J， SHLENS J， SZEGEDY C. Explaining and harnessing adversarial examples［J］. arXiv preprint arXiv：1412.6572， 2014.
［25］ HUANG S， PAPERNOT N， GOODFELLOW I， et al. Adversarial attacks on neural network policies［J］. arXiv preprint arXiv：1702.02284， 2017.
［26］ PATTANAIK A， TAMG Z Y， LIU S J， et al. Robust deep reinforcement learning with adversarial attacks［J］. arXiv preprint arXiv：1712.03632， 2017.
［27］ WU X， GUO W B， WEI H， et al. Adversarial policy training against deep reinforcement learning［C］// 30th USENIX Security Symposium（USENIX Security 21）. USENIX， 2021：1883-1900.
［28］ SCHULNMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［J］. arXiv preprint arXiv：1707.06347， 2017.
［29］ YI J， CHEN Y， LI J， et al. Predictive model performance： Offline and online evaluations［C］// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM， 2013：1294-1302.
［30］ LIN J Y， DZEPAROSKA K， ZHANG S Q， et al. On the robustness of cooperative multi-agent reinforcement learning［C］// 2020 IEEE Security and Privacy Workshops （SPW）. IEEE， 2020：62-68.

[1]	CHENG Zhang, LIU Dan, WANG Yanxia. Gaze Estimation Model Based on Hybrid Transformer [J]. Computer and Modernization, 2025, 0(04): 1-5.
[2]	JI Zhengjie, WEI Linjing. ICS-ResNet： A Lightweight Network for Maize Leaf Disease Classification [J]. Computer and Modernization, 2025, 0(04): 19-28.
[3]	ZHANG Huiyu1, LIU Lei1, YAN Dongmei2, LIANG Chengqing3. UAV Path Planning Based on YOLO and PPO [J]. Computer and Modernization, 2025, 0(04): 50-55.
[4]	WU Yichuan. A3C Based Task Offloading and Resource Allocation Algorithm for Internet of Vehicles [J]. Computer and Modernization, 2025, 0(04): 56-62.
[5]	LIU Wenxin1, XU Wenhui1, CHEN Zhaoye1, GU Haiying1, WEN Cong2, YAO Yulong2, ZENG Xi2. Predictive Modeling of Ash Conveying in Thermal Power Plants Based on RFECV-XGBoost and SHAP [J]. Computer and Modernization, 2025, 0(04): 63-69.
[6]	ZHOU Jing1, 2, LIU Dunlong1, 2, SANG Xuejia1, 2, ZHANG Shaojie3, YANG Hongjuan3. Identification Method for Potential Debris Flow Basins in the Wenchuan Earthquake-Affected Area Based on CNN-KAN [J]. Computer and Modernization, 2025, 0(04): 70-76.
[7]	LI Kai, JIN Yunpeng, LI Haiyang, KONG Shasha, YANG Peng, FANG Chengwu, HUANG Xiangjie, HAN Yaosheng, LI Chunmei. AGP Calculation Methods in UAV Imagery Based on Image Segmentation [J]. Computer and Modernization, 2025, 0(04): 83-88.
[8]	WANG Lidan1, ZHAO Huaici2, PAN Duotao1, FANG Jian2, YUAN Decheng1. Infrared and Visible Image Fusion Based on Twin Axial-attention and Dual-discriminator Generative Adversarial Network [J]. Computer and Modernization, 2025, 0(04): 89-95.
[9]	MA Qi1, WEN Yudong1, LIANG Shangrong2, WANG Ke2. Cigarette Laser Code Recognition Method Based on DBNet and CRNN Fusion Model [J]. Computer and Modernization, 2025, 0(04): 96-102.
[10]	LI Zhihao, ZHAO Cong, WU You, CHEN Zechun, HE hang, DONG Chongchong. Construction and Migration Method of Zero Trust Architecture for Marketing System [J]. Computer and Modernization, 2025, 0(04): 119-126.
[11]	HOU Menghan, WEI Changfa. Construction of Depression Recognition Model Based on Multi-Feature Fusion [J]. Computer and Modernization, 2025, 0(03): 1-5.
[12]	TANG Rui1, WU Jianchao1, CHEN Jianbo1, CHAI Jiang1, WANG Qian1, HE Yuchen2. Improved YOLOv8s Algorithm Based on GiraffeDet for Transmission Line Icing Detection [J]. Computer and Modernization, 2025, 0(03): 6-11.
[13]	CAO Lu, DING Cangfeng, MA Lerong, YAN Zhaoyao, YOU Hao. Multilevel Joint Graph Embedding for Lipophilic Molecular Classification [J]. Computer and Modernization, 2025, 0(03): 12-21.
[14]	WANG Zeyu, HAN Jianning, HAO Guodong, YANG Run. Speech Enhancement Algorithm Based on Parallel Cascaded Time-frequency Conformer Generative Adversarial Network [J]. Computer and Modernization, 2025, 0(03): 22-28.
[15]	LEI Jiyue, SU Peng, NIE Yun, LIN Chuan. Review of Large Language Model Question Answering Systems for International Event Analysis [J]. Computer and Modernization, 2025, 0(03): 29-37.