[1] LECUN Y, BENGIO Y, HINTON G.Deep learning[J]. Nature, 2015,521(7553):436-444. [2] SUTTON R S, BARTO A G.Reinforcement Learning: An Introduction[M]. MIT Press, 2018. [3] 刘朝阳,穆朝絮,孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4):314-326. [4] 杨思明,单征,丁煜,等. 深度强化学习研究综述[J]. 计算机工程, 2021,47(12):19-29. [5] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533. [6] SILVER D, HUANG A, MADDISON C J, et al.Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587):484-489. [7] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al.Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676):354-359. [8] 刘威,张东霞,王新迎,等. 基于深度强化学习的电网紧急控制策略研究[J]. 中国电机工程学报, 2018,38(1):109-119. [9] 李航,李国杰,汪可友. 基于深度强化学习的电动汽车实时调度策略[J]. 电力系统自动化, 2020,44(22):161-167. [10] 孔松涛,刘池池,史勇,等. 深度强化学习在智能制造中的应用展望综述[J]. 计算机工程与应用, 2021,57(2):49-59. [11] 齐义文,张弛,陈禹西. 基于强化学习方法的变循环航空发动机推力控制[J]. 沈阳航空航天大学学报, 2022,39(3):40-49. [12] 董豪,丁子涵,仉尚航. 深度强化学习:基础、研究与应用[M]. 北京:电子工业出版社, 2020. [13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013. [14] PAN Y, ZAHEER M, WHITE A, et al. Organizing experience: A deeper look at replay mechanisms for sample-based planning in continuous state domains[J]. arXiv preprint arXiv:1806.04624, 2018. [15] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al.Hindsight experience replay[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017. [16] LIN L J.Self-improving reactive agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992,8(3):293-321. [17] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015. [18] HESTER T, VECERIK M, PIETQUIN O, et al.Deep q-learning from demonstrations[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1). DOI:10.1609/aaai.v32i1.11757. [19] OH J, GUO Y, SINGH S, et al.Self-imitation learning[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:3878-3887. [20] LUO J L, LI H.Dynamic experience replay[C]// Proceedings of the Conference on Robot Learning. 2020:1191-1200. [21] LIU X H, XUE Z, PANG J, et al.Regret minimization experience replay in off-policy reinforcement learning[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). 2021. [22] ZHANG S, SUTTON R S. A deeper look at experience replay[J]. arXiv preprint arXiv:1712.01275, 2017. [23] SUN P, ZHOU W, LI H.Attentive experience replay[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4):5900-5907. [24] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015. [25] VAN HASSELT H, GUEZ A, SILVER D.Deep reinforcement learning with double q-learning[C]// Proceedings of the 30th AAAI conference on artificial intelligence. 2016,30(1). [26] WENG J, CHEN H, YAN D, et al.Tianshou: A highly modularized deep reinforcement learning library[J]. arXiv preprint arXiv:2107.14171, 2021. [27] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI gym[J]. arXiv preprint arXiv:1606.01540, 2016. |