An Experience Replay Strategy Based on Mixed Samples

doi:10.3969/j.issn.1006-2475.2023.06.006

Abstract

Abstract: Experience replay strategy has become an important part of deep reinforcement learning algorithm. It can not only accelerate the convergence of deep reinforcement learning algorithm, but also enhance the performance of agents. Mainstream experience replay strategies use uniform sampling, priority experience replay, expert experience replay and other methods to accelerate learning. In order to further improve the utilization of experience samples in deep reinforcement learning, this paper proposes an experience replay strategy based on mixed samples （ER-MS）. This strategy mainly uses two methods： immediate learning of the latest experience and review of successful experience. It immediately learns the latest samples generated by the interaction between the agent and the environment, and uses an additional experience buffer pool to save the samples of successful rounds for experience replay. Experiments show that the experience replay strategy based on mixed samples combined with DDPG algorithm can achieve better results in Open AI mujoco task.

Key words: experience replay, deep reinforcement learning, expert experience

CLC Number:

TP301.6

LAI Jian-bin, FENG Gang. An Experience Replay Strategy Based on Mixed Samples[J]. Computer and Modernization, 2023, 0(06): 33-38.

References

[1] LECUN Y, BENGIO Y, HINTON G.Deep learning[J]. Nature, 2015,521(7553):436-444.
[2] SUTTON R S, BARTO A G.Reinforcement Learning: An Introduction[M]. MIT Press, 2018.
[3] 刘朝阳,穆朝絮,孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4):314-326.
[4] 杨思明,单征,丁煜,等. 深度强化学习研究综述[J]. 计算机工程, 2021,47(12):19-29.
[5] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533.
[6] SILVER D, HUANG A, MADDISON C J, et al.Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587):484-489.
[7] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al.Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676):354-359.
[8] 刘威,张东霞,王新迎,等. 基于深度强化学习的电网紧急控制策略研究[J]. 中国电机工程学报, 2018,38(1):109-119.
[9] 李航,李国杰,汪可友. 基于深度强化学习的电动汽车实时调度策略[J]. 电力系统自动化, 2020,44(22):161-167.
[10] 孔松涛,刘池池,史勇,等. 深度强化学习在智能制造中的应用展望综述[J]. 计算机工程与应用, 2021,57(2):49-59.
[11] 齐义文,张弛,陈禹西. 基于强化学习方法的变循环航空发动机推力控制[J]. 沈阳航空航天大学学报, 2022,39(3):40-49.
[12] 董豪,丁子涵,仉尚航. 深度强化学习:基础、研究与应用[M]. 北京:电子工业出版社, 2020.
[13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013.
[14] PAN Y, ZAHEER M, WHITE A, et al. Organizing experience: A deeper look at replay mechanisms for sample-based planning in continuous state domains[J]. arXiv preprint arXiv:1806.04624, 2018.
[15] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al.Hindsight experience replay[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017.
[16] LIN L J.Self-improving reactive agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992,8(3):293-321.
[17] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.
[18] HESTER T, VECERIK M, PIETQUIN O, et al.Deep q-learning from demonstrations[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1). DOI:10.1609/aaai.v32i1.11757.
[19] OH J, GUO Y, SINGH S, et al.Self-imitation learning[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:3878-3887.
[20] LUO J L, LI H.Dynamic experience replay[C]// Proceedings of the Conference on Robot Learning. 2020:1191-1200.
[21] LIU X H, XUE Z, PANG J, et al.Regret minimization experience replay in off-policy reinforcement learning[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). 2021.
[22] ZHANG S, SUTTON R S. A deeper look at experience replay[J]. arXiv preprint arXiv:1712.01275, 2017.
[23] SUN P, ZHOU W, LI H.Attentive experience replay[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4):5900-5907.
[24] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
[25] VAN HASSELT H, GUEZ A, SILVER D.Deep reinforcement learning with double q-learning[C]// Proceedings of the 30th AAAI conference on artificial intelligence. 2016,30(1).
[26] WENG J, CHEN H, YAN D, et al.Tianshou: A highly modularized deep reinforcement learning library[J]. arXiv preprint arXiv:2107.14171, 2021.
[27] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI gym[J]. arXiv preprint arXiv:1606.01540, 2016.