基于深度强化学习的机场出租车司机决策方法

doi:10.3969/j.issn.1006-2475.2020.08.015

计算机与现代化 ›› 2020, Vol. 0 ›› Issue (08): 94-99.doi: 10.3969/j.issn.1006-2475.2020.08.015

基于深度强化学习的机场出租车司机决策方法

(中国矿业大学数学学院，江苏徐州221100）

收稿日期:2020-02-16 出版日期:2020-08-17 发布日期:2020-08-17
作者简介:王鹏勇(1998-),男,江苏徐州人,本科生,研究方向:强化学习,计算机视觉,E-mail:10173704@cumt.edu.cn；陈龚涛(1999-),男,江苏无锡人,本科生,研究方向:统计计算,数据挖掘,E-mail: 10173703@cumt.edu.cn；赵江烁(1999-),男,河北保定人,本科生,研究方向:机器学习,数据挖掘,E-mail: 10173702@cumt.edu.cn。
基金资助:
中国矿业大学大学生创新训练计划项目（20190510）

Decision-making Method for Airport Taxi Drivers Based on Deep Reinforcement Learning

(School of Mathematics, China University of Mining and Technology, Xuzhou 221100, China)

Received:2020-02-16 Online:2020-08-17 Published:2020-08-17

摘要/Abstract

摘要： 针对以机场为代表的大型交通枢纽出租车调度困难的问题，从出租车司机利益的角度出发，提出一种基于改进深度强化学习的司机决策方法。该方法首先对机场环境和机场所在的城市环境进行模拟，定义了司机的状态、动作，与环境交互获得的奖励和状态转移。然后，以司机的状态参数作为DQN的输入，用DQN拟合状态-动作值函数（Q值函数）。最后，通过不断地让司机根据ε-贪心策略做出决策，并根据奖励函数达到更新DQN参数的目的。实验结果表明：在模拟的大、中、小型城市等环境下，司机都可以通过模型定量地得到当前各种决策动作的期望收益并作出合理的决策，从而自动地完成出租车调度的过程。

关键词: 出租车调度, 深度强化学习, DQN, Q值函数

Abstract: In order to deal with the difficulty of taxi dispatching in large transportation hub, especially in airport, from the view of the taxi driver’s profit, this paper proposes a decision-making method based on improved deep reinforcement learning. Firstly, the airport environment and the urban environment where the airport is located are simulated, and the driver’s states, actions, the rewards obtained from interaction with the environment and the state transitions are defined. Then, the states of the driver, as inputs, are fed into DQN to fit the values of Q-value function. Finally, through continuously simulating the drivers’ decisions by ε-greedy strategy and reward functions, this paper reaches the purpose of upgrading the parameters of DQN. The experiment results show that drivers can quantitatively get expected benefit for current decision actions and make proper decision through the model in simulated large, medium and small cities and other environments, so as to automatically complete the process of taxi dispatching.

Key words: taxi dispatching, deep reinforcement learning, DQN, Q-value function

中图分类号:

TP391.9

王鹏勇, 陈龚涛, 赵江烁. 基于深度强化学习的机场出租车司机决策方法[J]. 计算机与现代化, 2020, 0(08): 94-99.

WANG Peng-yong, CHEN Gong-tao, ZHAO Jiang-shuo. Decision-making Method for Airport Taxi Drivers Based on Deep Reinforcement Learning[J]. Computer and Modernization, 2020, 0(08): 94-99.

参考文献

［1］顾鸿儒,孙连坤. 基于层次颜色Petri网的交通紧急调度算法与建模［J］. 计算机工程与应用, 2016,52(16):261-270.
［2］李敏. 交通堵塞车流调度单点信号嵌入式控制仿真［J］. 计算机仿真, 2017,34(2):189-192.
［3］ GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: Standard and natural policy gradients［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2012,42(6):1291-1307.
［4］ DANN C, NEUMANN G, PETERS J. Policy evaluation with temporal differences: A survey and comparison［J］. Journal of Machine Learning Research, 2014,15:809-883.
［5］万里鹏,兰旭光,张翰博,等. 深度强化学习理论及其应用综述［J］. 模式识别与人工智能, 2019,32(1):67-81.
［6］ ZHANG D X, HAN X Q, DENG C Y. Review on the research and practice of deep learning and reinforcement learning in smart grids［J］. CSEE Journal of Power and Energy Systems, 2018,4(3):362-370.
［7］王竹晓,张彭彭,李为,等. 基于深度Q网络的电力工控网络异常检测系统［J］. 计算机与现代化, 2019(12):114-118.
［8］袁雯,刘惠义. 基于深度Q网络的仿人机器人步态优化［J］. 计算机与现代化, 2019(4):47-51.
［9］彭琛,韩立新. 基于深度强化学习的计步方法［J］. 计算机与现代化, 2019(1):63-68.

［10］GAO P, ZHANG Q Q, WANG F, et al. Learning reinforced attentional representation for end-to-end visual tracking［J］. Information Sciences, 2020,517:52-67.

［11］YAN S Y, CHEN C Y, WU C C. Solution methods for the taxi pooling problem［J］. Transportation, 2012,39(3):723-748.
［12］QI X, XIONG J, XU G Q, et al. Taxi-pooling scheduling model and algorithm based on many-to-many pickup and delivery problems［C］// The 16th COTA International Conference on Transportation Professionals. 2016:89-98.
［13］欧先锋，罗百通，向灿群,等. 一种出租车合乘业务方案设计［J］. 成都工业学院学报, 2017,20(2):43-49.
［14］曾伟良,吴淼森,孙为军,等. 自动驾驶出租车调度系统研究综述［J/OL］. 计算机科学, (2019-12-25)［2020-02-13］. http://kns.cnki.net/kcms/detail/50.1075.TP.20191225.0909.006.html.
［15］谢榕,潘维,柴崎亮介. 基于人工鱼群算法的出租车智能调度［J］. 系统工程理论与实践, 2017,37(11):2938-2947.
［16］MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning［DB/OL］. (2013-12-19)［2020-02-13］. https://arxiv.org/pdf/1312.5602.pdf.
［17］MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning［J］. Nature, 2015,518:529-533.
［18］VAN HASSELT H， GUEZ A， SILVER D, et al. Deep reinforcement learning with double Q-Learning［C］// The 30th AAAI Conference on Artificial Intelligence. 2016:2094-2100.
［19］WANG Z Y, TOM S, MATTEO H, et al. Dueling network architectures for deep reinforcement learning［C］// Proceedings of the 33rd International Conference on Machine Learning. 2016:1995-2003.
［20］SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized Experience Replay［DB/OL］.(2016-02-25)［2020-02-13］. https://arxiv.org/pdf/1511.05952.pdf.
［21］MNIH V，BADIA A P，MIRZA M, et al. Asynchronous methods for deep reinforcement learning［C］// Proceedings of the 33rd International Conference on Machine Learning. 2016:1928-1937.
［22］BELLEMARE M G， DABNEY W， MUNOS R, et al. A distributional perspective on reinforcement learning［C］// Proceedings of the 34rd International Conference on Machine Learning. 2017:449-458.
［23］周建频,张姝柳. 基于深度强化学习的动态库存路径优化［J］. 系统仿真学报, 2019,31(10):2155-2163.
［24］王云鹏,郭戈. 基于深度强化学习的有轨电车信号优先控制［J］. 自动化学报, 2019,45(12):2366-2377.

[1]	李桃迎, 李蒙, 武梦乔. 基于异构时空图卷积网络的出租车客流预测[J]. 计算机与现代化, 2024, 0(11): 13-18.
[2]	李爽1, 2, 叶宁1, 2, 徐康1, 2, 王甦1, 王汝传1, 2. 面向智慧养老的边缘计算卸载方法[J]. 计算机与现代化, 2024, 0(06): 95-102.
[3]	王健铭1, 王欣1, 李养辉2, 王殿龙1. 基于改进D3QN算法的泊车机器人路径规划[J]. 计算机与现代化, 2024, 0(03): 7-14.
[4]	李鹏, 徐珞. 一种面向城市战场的智能车自主导航方法[J]. 计算机与现代化, 2024, 0(01): 92-98.
[5]	张国有, 宋世峰. 基于D3QN的交通灯控制优化[J]. 计算机与现代化, 2023, 0(07): 30-35.
[6]	赖建彬, 冯刚. 一种基于混合样本的经验回放策略[J]. 计算机与现代化, 2023, 0(06): 33-38.
[7]	丁忠林, 李洋, 曹委, 谈宇浩, 徐波. 基于深度Q学习的电力物联网任务卸载研究[J]. 计算机与现代化, 2022, 0(11): 75-80.
[8]	吴水明, 吉志远, 王震宇, 景栋盛. 基于Dueling-DDQN的电力信息网络入侵检测算法[J]. 计算机与现代化, 2021, 0(12): 43-47.
[9]	刘露, 申国伟, 郭春, 崔允贺, 蒋朝惠, 伍大勇. 一种基于深度强化学习的Spark Streaming参数优化方法[J]. 计算机与现代化, 2021, 0(10): 49-56.
[10]	王竹晓，张彭彭，李为，吴克河，崔文超，程瑞. 基于深度Q网络的电力工控网络异常检测系统[J]. 计算机与现代化, 2019, 0(12): 114-.
[11]	袁雯，刘惠义. 基于深度Q网络的仿人机器人步态优化[J]. 计算机与现代化, 2019, 0(04): 47-.
[12]	彭琛,韩立新. 基于深度强化学习的计步方法[J]. 计算机与现代化, 2019, 0(01): 63-.
[13]	齐岳1，2，3，黄硕华1. 基于深度强化学习DDPG算法的投资组合管理[J]. 计算机与现代化, 2018, 0(05): 93-.

基于深度强化学习的机场出租车司机决策方法

Decision-making Method for Airport Taxi Drivers Based on Deep Reinforcement Learning

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 13

编辑推荐

Metrics

本文评价