[1] |
LECUN Y, BENGIO Y, HINTON G.Deep learning[J]. Nature, 2015,521(7553):436-444.
|
[2] |
SUTTON R S, BARTO A G.Reinforcement Learning: An Introduction[M]. MIT Press, 2018.
|
[3] |
刘朝阳,穆朝絮,孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4):314-326.
|
[4] |
杨思明,单征,丁煜,等. 深度强化学习研究综述[J]. 计算机工程, 2021,47(12):19-29.
|
[5] |
MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540):529-533.
|
[6] |
SILVER D, HUANG A, MADDISON C J, et al.Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587):484-489.
|
[7] |
SILVER D, SCHRITTWIESER J, SIMONYAN K, et al.Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676):354-359.
|
[8] |
刘威,张东霞,王新迎,等. 基于深度强化学习的电网紧急控制策略研究[J]. 中国电机工程学报, 2018,38(1):109-119.
|
[9] |
李航,李国杰,汪可友. 基于深度强化学习的电动汽车实时调度策略[J]. 电力系统自动化, 2020,44(22):161-167.
|
[10] |
孔松涛,刘池池,史勇,等. 深度强化学习在智能制造中的应用展望综述[J]. 计算机工程与应用, 2021,57(2):49-59.
|
[11] |
齐义文,张弛,陈禹西. 基于强化学习方法的变循环航空发动机推力控制[J]. 沈阳航空航天大学学报, 2022,39(3):40-49.
|
[12] |
董豪,丁子涵,仉尚航. 深度强化学习:基础、研究与应用[M]. 北京:电子工业出版社, 2020.
|
[13] |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013.
|
[14] |
PAN Y, ZAHEER M, WHITE A, et al. Organizing experience: A deeper look at replay mechanisms for sample-based planning in continuous state domains[J]. arXiv preprint arXiv:1806.04624, 2018.
|
[15] |
ANDRYCHOWICZ M, WOLSKI F, RAY A, et al.Hindsight experience replay[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017.
|
[16] |
LIN L J.Self-improving reactive agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992,8(3):293-321.
|
[17] |
SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[J]. arXiv preprint arXiv:1511.05952, 2015.
|
[18] |
HESTER T, VECERIK M, PIETQUIN O, et al.Deep q-learning from demonstrations[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1). DOI:10.1609/aaai.v32i1.11757.
|
[19] |
OH J, GUO Y, SINGH S, et al.Self-imitation learning[C]// Proceedings of the 35th International Conference on Machine Learning. 2018:3878-3887.
|
[20] |
LUO J L, LI H.Dynamic experience replay[C]// Proceedings of the Conference on Robot Learning. 2020:1191-1200.
|
[21] |
LIU X H, XUE Z, PANG J, et al.Regret minimization experience replay in off-policy reinforcement learning[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). 2021.
|
[22] |
ZHANG S, SUTTON R S. A deeper look at experience replay[J]. arXiv preprint arXiv:1712.01275, 2017.
|
[23] |
SUN P, ZHOU W, LI H.Attentive experience replay[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4):5900-5907.
|
[24] |
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
|
[25] |
VAN HASSELT H, GUEZ A, SILVER D.Deep reinforcement learning with double q-learning[C]// Proceedings of the 30th AAAI conference on artificial intelligence. 2016,30(1).
|
[26] |
WENG J, CHEN H, YAN D, et al.Tianshou: A highly modularized deep reinforcement learning library[J]. arXiv preprint arXiv:2107.14171, 2021.
|
[27] |
BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI gym[J]. arXiv preprint arXiv:1606.01540, 2016.
|