计算机与现代化 ›› 2019, Vol. 0 ›› Issue (04): 47-.doi: 10.3969/j.issn.1006-2475.2019.04.009

• 人工智能 • 上一篇    下一篇

基于深度Q网络的仿人机器人步态优化

  

  1.  (河海大学计算机与信息学院,江苏南京210098)
  • 收稿日期:2018-09-18 出版日期:2019-04-26 发布日期:2019-04-30
  • 作者简介:袁雯(1994-),女,江苏无锡人,硕士研究生,研究方向:深度强化学习,仿人机器人,E-mail: yomy1234@163.com; 刘惠义(1961-),男,江苏南京人,教授,博士,研究方向:深度学习,计算机图形学。
  • 基金资助:
    江苏省水利厅科技计划项目(2017003ZB)

Gait Optimization of Humanoid Robot Based on Deep Q Network

  1. (College of Computer and Information, Hohai University, Nanjing 210098, China)
  • Received:2018-09-18 Online:2019-04-26 Published:2019-04-30

摘要: 为实现仿人机器人快速稳定的行走,在满足有效参数组合的条件下,提出一种基于深度强化学习的步行参数训练算法以优化机器人步态。首先,从环境中捕获机器人步态模型参数作为DQN的输入;然后,用DQN来拟合机器人行走产生的状态-动作值函数;最后,通过动作选择策略选择当前机器人执行的步态动作,同时产生奖励函数达到更新DQN的目的。选择NAO仿真机器人为实验对象,在RoboCup3D仿真平台上进行实验,结果证明在此算法下,NAO仿人机器人可以获得稳定的双足步行。

关键词: 仿人机器人, 深度强化学习, DQN, 步态优化, RoboCup3D

Abstract: In order to realize the fast and stable walking of humanoid robot, and under the condition that the effective parameter combination is satisfied, a walking parameter training algorithm based on deep reinforcement learning is proposed to optimize the gait of humanoid robot. First of all, we capture the robot gait model parameters from the environment as the input of DQN. And then, DQN is used to fit the robot state-action value function. At last, by action selection strategy, we choose the gait of a robot to perform current action, at the same time produce reward function to achieve the aim of updating DQN. By selecting NAO robot as the experimental object and conducting experiments on the RoboCup3D simulation platform, the results show that using this algorithm, NAO robot can achieve stable bipedal walking.

Key words: humanoid robot, deep reinforcement learning, DQN, gait optimization, RoboCup3D

中图分类号: