基于深度Q网络的仿人机器人步态优化

doi:10.3969/j.issn.1006-2475.2019.04.009

计算机与现代化 ›› 2019, Vol. 0 ›› Issue (04): 47-.doi: 10.3969/j.issn.1006-2475.2019.04.009

基于深度Q网络的仿人机器人步态优化

(河海大学计算机与信息学院,江苏南京210098)

收稿日期:2018-09-18 出版日期:2019-04-26 发布日期:2019-04-30
作者简介:袁雯（1994-），女，江苏无锡人，硕士研究生，研究方向：深度强化学习，仿人机器人，E-mail: yomy1234@163.com；刘惠义（1961-），男，江苏南京人，教授，博士，研究方向：深度学习，计算机图形学。
基金资助:
江苏省水利厅科技计划项目(2017003ZB)

Gait Optimization of Humanoid Robot Based on Deep Q Network

(College of Computer and Information, Hohai University, Nanjing 210098, China)

Received:2018-09-18 Online:2019-04-26 Published:2019-04-30

摘要/Abstract

摘要： 为实现仿人机器人快速稳定的行走，在满足有效参数组合的条件下，提出一种基于深度强化学习的步行参数训练算法以优化机器人步态。首先，从环境中捕获机器人步态模型参数作为DQN的输入；然后，用DQN来拟合机器人行走产生的状态-动作值函数；最后，通过动作选择策略选择当前机器人执行的步态动作，同时产生奖励函数达到更新DQN的目的。选择NAO仿真机器人为实验对象，在RoboCup3D仿真平台上进行实验，结果证明在此算法下，NAO仿人机器人可以获得稳定的双足步行。

关键词: 仿人机器人, 深度强化学习, DQN, 步态优化, RoboCup3D

Abstract: In order to realize the fast and stable walking of humanoid robot, and under the condition that the effective parameter combination is satisfied, a walking parameter training algorithm based on deep reinforcement learning is proposed to optimize the gait of humanoid robot. First of all, we capture the robot gait model parameters from the environment as the input of DQN. And then, DQN is used to fit the robot state-action value function. At last, by action selection strategy, we choose the gait of a robot to perform current action, at the same time produce reward function to achieve the aim of updating DQN. By selecting NAO robot as the experimental object and conducting experiments on the RoboCup3D simulation platform, the results show that using this algorithm, NAO robot can achieve stable bipedal walking.

Key words: humanoid robot, deep reinforcement learning, DQN, gait optimization, RoboCup3D

中图分类号:

TP242.6

袁雯，刘惠义. 基于深度Q网络的仿人机器人步态优化[J]. 计算机与现代化, 2019, 0(04): 47-.

YUAN Wen， LIU Hui-yi . Gait Optimization of Humanoid Robot Based on Deep Q Network[J]. Computer and Modernization, 2019, 0(04): 47-.

参考文献

［1］ YUAN Q, XI Z, LU Q, et al. Method and experiment of the NAO humanoid robot walking on a slope based on CoM motion estimation and control［C］// International Conference on Intelligent Robotics and Applications. Springer, 2017:154-165.
［2］ KAJITA S, KANEHIRO F, KANEKO K, et al. Biped walking pattern generation by using preview control of zero-moment point［C］// Proceedings of IEEE International Conference on Robotics and Automation. 2003,2:1620-1626.
［3］ MAXIMO M R, COLOMBINI E L, RIBEIRO C H. Stable and fast model-free walk with arms movement for humanoid robots［J］. International Journal of Advanced Robotic System, 2017,14(3):1-11.
［4］ HUAN T T, ANH H P H. Novel stable walking for humanoid robot using particle swarm optimization algorithm［C］// Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering. 2015.
［5］ RAJENDRA R, PRATIHAR D K. Analysis of double support phase of biped robot and multi-objective optimization using genetic algorithm and particle swarm optimization algorithm［J］. Sadhana, 2015,40(2):549-575.
［6］ SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction［M］. MIT press, 1998.
［7］ JAIN A K, MAO J, MOHIUDDIN K M. Artificial neural networks: A tutorial［J］. Computer, 1996(3):31-44.
［8］ RIEDMILLER M. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method［C］// European Conference on Machine Learning. Springer, 2005:317-328.
［9］ ZHANG Z, LYONS M, SCHUSTER M, et al. Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron［C］// Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 1998:454-459.
［10］LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning［C］// IEEE 2010 International Joint Conference on Neural Networks (IJCNN). 2010:1-8.
［11］MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning［EB/OL］. (2013-12-19). https://arxiv.org/abs/1312.5602.
［12］MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning［J］. Nature, 2015,518(7540):529-533.
［13］WATKINS C J C H. Learning from delayed rewards［J］. Robotics & Autonomous Systems, 1989,15(4):233-235
［14］PHANITEJA S, DEWANGAN P, GUHAN P, et al. A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots［C］// 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO). 2017:1818-1823.
［15］DANEL M, SKRBEK M. Humanoid robot control by offline actor-critic learning［J］. CURE Working Proceedings, 2017, 1885.
［16］GOUAILLIER D, HUGEL V, BLAZEVIC P, et al. Mechatronic design of NAO humanoid［C］// IEEE International Conference on Robotics and Automation(ICRA’09). 2009:769-774.
［17］XIN J, ZHAO H, LIU D, et al. Application of deep reinforcement learning in mobile robot path planning［C］// Chinese Automation Congress (CAC). IEEE, 2017:7112-7116.
［18］VUKOBRATOVI〖XCC.TIF,XQ〗 M, BOROVAC B. Zero-moment point--thirty five years of its life［J］. International Journal of Humanoid Robotics, 2004,1(1):157-173.
［19］VUKOBRATOVI〖XCC.TIF,XQ〗 M, BOROVAC B, POTKONJAK V. ZMP: A review of some basic misunderstandings［J］. International Journal of Humanoid Robotics, 2006,3(2):153-175.
［20］刘全,翟建伟,章宗长,等. 深度强化学习综述［J］. 计算机学报, 2018,41(1):1-27.
［21］刘建伟,高峰,罗雄麟. 基于值函数和策略梯度的深度强化学习综述［J］. 计算机学报, 2018.
［22］ABADI M, BARHAM P, CHEN J, et al. Tensorflow: A system for large-scale machine learning［C］// Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI’16). 2016:265-283.
［23］KONDA V R, TSITSIKLIS J N. Actor-critic algorithms［J］. SIAM Journal on Control and Optimization, 2003,42(4):1143-1166.
［24］LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning［EB/OL］. (2015-09-09). https://arxiv.org/abs/1509.02971.

[1]	李爽1, 2, 叶宁1, 2, 徐康1, 2, 王甦1, 王汝传1, 2. 面向智慧养老的边缘计算卸载方法[J]. 计算机与现代化, 2024, 0(06): 95-102.
[2]	王健铭1, 王欣1, 李养辉2, 王殿龙1. 基于改进D3QN算法的泊车机器人路径规划[J]. 计算机与现代化, 2024, 0(03): 7-14.
[3]	李鹏, 徐珞. 一种面向城市战场的智能车自主导航方法[J]. 计算机与现代化, 2024, 0(01): 92-98.
[4]	张国有, 宋世峰. 基于D3QN的交通灯控制优化[J]. 计算机与现代化, 2023, 0(07): 30-35.
[5]	赖建彬, 冯刚. 一种基于混合样本的经验回放策略[J]. 计算机与现代化, 2023, 0(06): 33-38.
[6]	丁忠林, 李洋, 曹委, 谈宇浩, 徐波. 基于深度Q学习的电力物联网任务卸载研究[J]. 计算机与现代化, 2022, 0(11): 75-80.
[7]	吴水明, 吉志远, 王震宇, 景栋盛. 基于Dueling-DDQN的电力信息网络入侵检测算法[J]. 计算机与现代化, 2021, 0(12): 43-47.
[8]	刘露, 申国伟, 郭春, 崔允贺, 蒋朝惠, 伍大勇. 一种基于深度强化学习的Spark Streaming参数优化方法[J]. 计算机与现代化, 2021, 0(10): 49-56.
[9]	王鹏勇, 陈龚涛, 赵江烁. 基于深度强化学习的机场出租车司机决策方法[J]. 计算机与现代化, 2020, 0(08): 94-99.
[10]	王竹晓，张彭彭，李为，吴克河，崔文超，程瑞. 基于深度Q网络的电力工控网络异常检测系统[J]. 计算机与现代化, 2019, 0(12): 114-.
[11]	彭琛,韩立新. 基于深度强化学习的计步方法[J]. 计算机与现代化, 2019, 0(01): 63-.
[12]	齐岳1，2，3，黄硕华1. 基于深度强化学习DDPG算法的投资组合管理[J]. 计算机与现代化, 2018, 0(05): 93-.
[13]	何荣义1,李春光2. RoboCup3D仿真机器人步态优化研究[J]. 计算机与现代化, 2018, 0(03): 20-.
[14]	姚千燕;杨宜民. RoboCup3D仿真系统中的机器人自定位方法[J]. 计算机与现代化, 2011, 12(12): 141-143,.

基于深度Q网络的仿人机器人步态优化

Gait Optimization of Humanoid Robot Based on Deep Q Network

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

本文评价