计算机与现代化 ›› 2022, Vol. 0 ›› Issue (01): 98-102.

• 人工智能 • 上一篇    下一篇

基于深度强化学习的多无人机电力巡检任务规划

  

  1. (1.南京航空航天大学自动化学院,江苏南京211106;2.南京理工大学紫金学院计算机学院,江苏南京210023)
  • 出版日期:2022-01-24 发布日期:2022-01-24
  • 作者简介:马瑞(1997—),男,山东济南人,硕士研究生,研究方向:强化学习,任务规划,多智能体,E-mail: maruinuaa@nuaa.edu.cn; 通信作者:欧阳权(1991—),男,讲师,博士,研究方向:无人机飞行控制,电池管理,E-mail: ouyangquan@nuaa.edu.cn; 吴兆香(1995—),女,江苏扬州人,硕士研究生,研究方向:无人机集群控制,E-mail: wuzhaoxiang@nuaa.edu.cn: 丛玉华(1981—),女,讲师,博士研究生,研究方向:跨域协同,无人机飞行控制,E-mail: 28989116@qq.com; 王志胜(1970—),男,湖北荆门人,教授,博士,研究方向:信息融合,无人机蜂群控制,计算机视觉,E-mail: wangzhisheng@nuaa.edu.cn。
  • 基金资助:
    国家自然科学基金面上项目(61473144); 南京航空航天大学研究生创新基金资助项目(kfjj20200334); 南京理工大学紫金学院校级科研项目(2019ZRKX0401006)

Multi-UAV Power Inspection Task Planning Technology Based on Deep Reinforcement Learning

  1. (1. College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2. College of Computer Science, Nanjing University of Science and Technology Zijin College, Nanjing 210023, China)
  • Online:2022-01-24 Published:2022-01-24

摘要: 无人机因其成本低、操控性强等优势,在电网线路与电塔的巡检任务中取得了广泛的应用。在大范围电网巡检任务中,单台无人机由于其续航半径有限,需要多架无人机协作完成巡检任务。传统任务规划方法存在计算速度慢、协作效果不突出等问题。针对以上问题,本文提出一种基于多智能体强化学习值混合网络(QMIX)的任务规划算法,采用集中训练、分散执行的框架,为每架无人机建立循环神经网络,并通过混合网络得到联合动作值函数指导训练。该算法通过设计任务奖赏函数以激发多智能体的协作能力,有效解决多无人机任务规划协作效率低的问题。仿真实验结果表明所提算法的任务时间相比于常用的值分解网络(VDN)算法减少了350.4 s。

关键词: 强化学习, 电力巡检, 多智能体协作

Abstract: UAVs have been widely used in the inspection tasks of power grid lines and electrical towers due to their advantages of flexibility, low cost and strong maneuverability. Because of the limited range of a single UAV, multiple UAVs are required to cooperate in a wide range grid inspection. However, the traditional planning methods cannot work well because of slow computing speed and unobvious collaborative effect. To remedy these deficits, a new mission planning algorithm is proposed in this work, which is based on multi-agent reinforcement learning algorithm QMIX. On the basis of the framework of intensive training and decentralized execution, this algorithm establishes RNN network for each UAV and gets the joint action value function guideline for training by mixing network. To simulate the collaboration capabilities of multi-agents, a reward function for collaboration task is designed, and it solves the problem of low collaboration efficiency in multi-UAV mission planning. The simulation results demonstrate that the proposed algorithm takes 350.4 seconds less than VDN algorithm.

Key words: reinforcement learning, power inspection, multi-agent collaboration