[1] YUAN Q, XI Z, LU Q, et al. Method and experiment of the NAO humanoid robot walking on a slope based on CoM motion estimation and control[C]// International Conference on Intelligent Robotics and Applications. Springer, 2017:154-165.
[2] KAJITA S, KANEHIRO F, KANEKO K, et al. Biped walking pattern generation by using preview control of zero-moment point[C]// Proceedings of IEEE International Conference on Robotics and Automation. 2003,2:1620-1626.
[3] MAXIMO M R, COLOMBINI E L, RIBEIRO C H. Stable and fast model-free walk with arms movement for humanoid robots[J]. International Journal of Advanced Robotic System, 2017,14(3):1-11.
[4] HUAN T T, ANH H P H. Novel stable walking for humanoid robot using particle swarm optimization algorithm[C]// Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering. 2015.
[5] RAJENDRA R, PRATIHAR D K. Analysis of double support phase of biped robot and multi-objective optimization using genetic algorithm and particle swarm optimization algorithm[J]. Sadhana, 2015,40(2):549-575.
[6] SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. MIT press, 1998.
[7] JAIN A K, MAO J, MOHIUDDIN K M. Artificial neural networks: A tutorial[J]. Computer, 1996(3):31-44.
[8] RIEDMILLER M. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method[C]// European Conference on Machine Learning. Springer, 2005:317-328.
[9] ZHANG Z, LYONS M, SCHUSTER M, et al. Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron[C]// Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. 1998:454-459.
[10]LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]// IEEE 2010 International Joint Conference on Neural Networks (IJCNN). 2010:1-8.
[11]MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning[EB/OL]. (2013-12-19). https://arxiv.org/abs/1312.5602.
[12]MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533.
[13]WATKINS C J C H. Learning from delayed rewards[J]. Robotics & Autonomous Systems, 1989,15(4):233-235
[14]PHANITEJA S, DEWANGAN P, GUHAN P, et al. A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots[C]// 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO). 2017:1818-1823.
[15]DANEL M, SKRBEK M. Humanoid robot control by offline actor-critic learning[J]. CURE Working Proceedings, 2017, 1885.
[16]GOUAILLIER D, HUGEL V, BLAZEVIC P, et al. Mechatronic design of NAO humanoid[C]// IEEE International Conference on Robotics and Automation(ICRA’09). 2009:769-774.
[17]XIN J, ZHAO H, LIU D, et al. Application of deep reinforcement learning in mobile robot path planning[C]// Chinese Automation Congress (CAC). IEEE, 2017:7112-7116.
[18]VUKOBRATOVI〖XCC.TIF,XQ〗 M, BOROVAC B. Zero-moment point--thirty five years of its life[J]. International Journal of Humanoid Robotics, 2004,1(1):157-173.
[19]VUKOBRATOVI〖XCC.TIF,XQ〗 M, BOROVAC B, POTKONJAK V. ZMP: A review of some basic misunderstandings[J]. International Journal of Humanoid Robotics, 2006,3(2):153-175.
[20]刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1):1-27.
[21]刘建伟,高峰,罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J]. 计算机学报, 2018.
[22]ABADI M, BARHAM P, CHEN J, et al. Tensorflow: A system for large-scale machine learning[C]// Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation(OSDI’16). 2016:265-283.
[23]KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[J]. SIAM Journal on Control and Optimization, 2003,42(4):1143-1166.
[24]LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[EB/OL]. (2015-09-09). https://arxiv.org/abs/1509.02971. |