基于深度强化学习的多无人机电力巡检任务规划

计算机与现代化 ›› 2022, Vol. 0 ›› Issue (01): 98-102.

基于深度强化学习的多无人机电力巡检任务规划

(1.南京航空航天大学自动化学院,江苏南京211106；2.南京理工大学紫金学院计算机学院,江苏南京210023)

出版日期:2022-01-24 发布日期:2022-01-24
作者简介:马瑞(1997—),男,山东济南人,硕士研究生,研究方向:强化学习,任务规划,多智能体,E-mail: maruinuaa@nuaa.edu.cn; 通信作者：欧阳权(1991—),男,讲师,博士,研究方向:无人机飞行控制,电池管理,E-mail: ouyangquan@nuaa.edu.cn; 吴兆香(1995—),女,江苏扬州人,硕士研究生,研究方向:无人机集群控制,E-mail: wuzhaoxiang@nuaa.edu.cn: 丛玉华(1981—),女,讲师,博士研究生,研究方向:跨域协同,无人机飞行控制,E-mail: 28989116@qq.com; 王志胜(1970—),男,湖北荆门人,教授,博士,研究方向:信息融合,无人机蜂群控制,计算机视觉,E-mail: wangzhisheng@nuaa.edu.cn。
基金资助:
国家自然科学基金面上项目（61473144）; 南京航空航天大学研究生创新基金资助项目（kfjj20200334）; 南京理工大学紫金学院校级科研项目(2019ZRKX0401006)

Multi-UAV Power Inspection Task Planning Technology Based on Deep Reinforcement Learning

（1. College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
2. College of Computer Science, Nanjing University of Science and Technology Zijin College, Nanjing 210023, China）

Online:2022-01-24 Published:2022-01-24

摘要/Abstract

摘要： 无人机因其成本低、操控性强等优势，在电网线路与电塔的巡检任务中取得了广泛的应用。在大范围电网巡检任务中，单台无人机由于其续航半径有限，需要多架无人机协作完成巡检任务。传统任务规划方法存在计算速度慢、协作效果不突出等问题。针对以上问题，本文提出一种基于多智能体强化学习值混合网络（QMIX）的任务规划算法，采用集中训练、分散执行的框架，为每架无人机建立循环神经网络，并通过混合网络得到联合动作值函数指导训练。该算法通过设计任务奖赏函数以激发多智能体的协作能力，有效解决多无人机任务规划协作效率低的问题。仿真实验结果表明所提算法的任务时间相比于常用的值分解网络（VDN）算法减少了350.4 s。

关键词: 强化学习, 电力巡检, 多智能体协作

Abstract: UAVs have been widely used in the inspection tasks of power grid lines and electrical towers due to their advantages of flexibility, low cost and strong maneuverability. Because of the limited range of a single UAV, multiple UAVs are required to cooperate in a wide range grid inspection. However, the traditional planning methods cannot work well because of slow computing speed and unobvious collaborative effect. To remedy these deficits, a new mission planning algorithm is proposed in this work, which is based on multi-agent reinforcement learning algorithm QMIX. On the basis of the framework of intensive training and decentralized execution, this algorithm establishes RNN network for each UAV and gets the joint action value function guideline for training by mixing network. To simulate the collaboration capabilities of multi-agents, a reward function for collaboration task is designed, and it solves the problem of low collaboration efficiency in multi-UAV mission planning. The simulation results demonstrate that the proposed algorithm takes 350.4 seconds less than VDN algorithm.

Key words: reinforcement learning, power inspection, multi-agent collaboration

马瑞, 欧阳权, 吴兆香, 丛玉华, 王志胜. 基于深度强化学习的多无人机电力巡检任务规划[J]. 计算机与现代化, 2022, 0(01): 98-102.

MA Rui, OUYANG Quan, WU Zhao-xiang, CONG Yu-hua, WANG Zhi-sheng. Multi-UAV Power Inspection Task Planning Technology Based on Deep Reinforcement Learning[J]. Computer and Modernization, 2022, 0(01): 98-102.

参考文献

［1］ LIU W N, LIU L L, HE G L, et al. UAV inspection path planning based on transmission line technology［J］. Journal of Physics: Conference Series, 2020,1648(4). DOI: 10.1088/1742-6596/1648/4/042083.
［2］蔡炜，徐圣兵，罗干，等. 输电线路鸟巢识别中的无人机优化巡检研究［J］. 人工智能与机器人研究, 2020,9(2):110-122.
［3］ NIU W N, NING B F, ZHOU H. Design of data transmission system of human-autonomous devices for UAV inspection of transmission line status［J］. Journal of Ambient Intelligence and Humanized Computing, 2019. DOI: 10.1007/s12652-019-01504-x.
［4］ RAMIREZ-ATENCIA C, RODRIGUEZ-FERNANDEZ V, CAMACHO D. A revision on multi-criteria decision making methods for multi-UAV mission planning support［J］. Expert Systems with Applications, 2020,160. DOI: 10.1016/j.eswa.2020. 113708.
［5］赵明. 多无人机系统的协同目标分配和航迹规划方法研究［D］. 哈尔滨:哈尔滨工业大学, 2016.
［6］丁家如. 多无人机任务分配与路径规划算法研究［D］. 杭州:浙江大学, 2016.
［7］ BRAUN V, LUPKEN A, FLEGEL S, et al. Active debris removal of multiple priority targets［J］. 〖HJ0.68mm〗Advances in Space Research, 2013,51(9):1638-1648.
［8］ ZUIANI F, VASILE M. Preliminary design of debris removal missions by means of simplified models for low-thrust. many-revolution transfers［J］. International Journal of Aerospace Engineering, 2012. DOI: 10.1155/2012/836250.
［9］ YU J, CHEN X Q, CHEN L H, et al. Optimal scheduling of GEO debris removing based on hybrid optimal control theory［J］. Acta Astronautica, 2014,93:400-409.
［10］WANG Z, CHEN C L, LI H X, et al. A novel incremental learning scheme for reinforcement learning in dynamic environments［C］// 2016 12th World Congress on Intelligent Control and Automation (WCICA). 2016:2426-2431.
［11］LEE M G, YU K M. Dynamic path planning based on an improved ant colony optimization with genetic algorithm［C］// 2018 IEEE Asia-Pacific Conference on Antennas and Propagation (APCAP). 2018:134-135.
［12］CERF M. Multiple space debris collecting mission: Optimal mission planning［J］. Journal of Optimization Theory and Applications, 2015,167(1):195-218.
［13］LIU Y, YANG J N, WANG Y Z, et al. Multi-objective optimal preliminary planning of multi-debris active removal mission in LEO［J］. Science China Information Sciences, 2017,60(7):203-212.
［14］LIU Y, YANG J N, HU Y H, et al. A multi-objective planning method for multi-debris active removal mission in LEO［C］// AIAA Guidance, Navigation, and Control Conference. 2017. DOI: 10.2514/6.2017-1733.
［15］LIOU J C, SHOOTS D. Orbital debris quarterly news［J］. National Aeronautics and Space Administration, 2014,18(4):1-12.
［16］IZZO D, GETZNER I, HENNES D, et al. Evolving Solutions to TSP variants for active space debris removal［C］// 2015 Annual Conference on Genetic and Evolutionary Computation. 2015:1207-1214.
［17］YANG J N, HU Y H, LIU Y, et al. A maximal-reward preliminary planning for multi-debris active removal mission in LEO with a greedy heuristic method［J］. Acta Astronautica, 2018,149:123-142.
［18］STUART J, HOWELL K, WILSON R. Application of multi-agent coordination methods to the design of space debris mitigation tours［J］. Advances in Space Research, 2016,57(8):1680-1697.
［19］BIANCHI R A C, RIBEIRO C H C, COSTA A H R. On the relation between ant colony optimization and heuristically accelerated reinforcement learning［C］// 1st International Workshop on Hybrid Control of Autonomous System. 2009:49-55.
［20］BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008,38(2):156-172.
［21］周文吉,俞扬. 分层强化学习综述［J］. 智能系统学报, 2017,12(5):590-594.
［22］唐振韬，邵坤，赵冬斌，等. 深度强化学习进展：从AlphaGo到AlphaGo Zero［J］. 控制理论与应用， 2017,34(12):1529-1546.
［23］RASHID T， SAMVELYAN M， DE WITT C S， et al. QMIX：Monotonic value function factorisation for deep multi-agent reinforcement learning［C］// Proceedings of the 35th International Conference on Machine Learning. 2018:4295-4304.

[1]	赵花蕊. STRL：基于强化学习的测试算法[J]. 计算机与现代化, 2024, 0(08): 5-10.
[2]	李爽1, 2, 叶宁1, 2, 徐康1, 2, 王甦1, 王汝传1, 2. 面向智慧养老的边缘计算卸载方法[J]. 计算机与现代化, 2024, 0(06): 95-102.
[3]	王健铭1, 王欣1, 李养辉2, 王殿龙1. 基于改进D3QN算法的泊车机器人路径规划[J]. 计算机与现代化, 2024, 0(03): 7-14.
[4]	李鹏, 徐珞. 一种面向城市战场的智能车自主导航方法[J]. 计算机与现代化, 2024, 0(01): 92-98.
[5]	张国有, 宋世峰. 基于D3QN的交通灯控制优化[J]. 计算机与现代化, 2023, 0(07): 30-35.
[6]	张志国. 一种基于强化学习的铁路通信基站天线覆盖自优化方法[J]. 计算机与现代化, 2023, 0(07): 69-72.
[7]	赖建彬, 冯刚. 一种基于混合样本的经验回放策略[J]. 计算机与现代化, 2023, 0(06): 33-38.
[8]	丁忠林, 李洋, 曹委, 谈宇浩, 徐波. 基于深度Q学习的电力物联网任务卸载研究[J]. 计算机与现代化, 2022, 0(11): 75-80.
[9]	吴水明, 吉志远, 王震宇, 景栋盛. 基于Dueling-DDQN的电力信息网络入侵检测算法[J]. 计算机与现代化, 2021, 0(12): 43-47.
[10]	刘露, 申国伟, 郭春, 崔允贺, 蒋朝惠, 伍大勇. 一种基于深度强化学习的Spark Streaming参数优化方法[J]. 计算机与现代化, 2021, 0(10): 49-56.
[11]	李蒙, 韩立新. 基于深度强化学习的黑盒对抗攻击算法[J]. 计算机与现代化, 2021, 0(04): 117-121.
[12]	王海红, 刘莉. 基于分层和强化学习的改进路径搜索算法[J]. 计算机与现代化, 2020, 0(11): 77-82.
[13]	王鹏勇, 陈龚涛, 赵江烁. 基于深度强化学习的机场出租车司机决策方法[J]. 计算机与现代化, 2020, 0(08): 94-99.
[14]	袁雯，刘惠义. 基于深度Q网络的仿人机器人步态优化[J]. 计算机与现代化, 2019, 0(04): 47-.
[15]	彭琛,韩立新. 基于深度强化学习的计步方法[J]. 计算机与现代化, 2019, 0(01): 63-.