计算机与现代化 ›› 2020, Vol. 0 ›› Issue (08): 94-99.doi: 10.3969/j.issn.1006-2475.2020.08.015

• 人工智能 • 上一篇    下一篇

基于深度强化学习的机场出租车司机决策方法

  

  1. (中国矿业大学数学学院,江苏徐州221100)
  • 收稿日期:2020-02-16 出版日期:2020-08-17 发布日期:2020-08-17
  • 作者简介:王鹏勇(1998-),男,江苏徐州人,本科生,研究方向:强化学习,计算机视觉,E-mail:10173704@cumt.edu.cn; 陈龚涛(1999-),男,江苏无锡人,本科生,研究方向:统计计算,数据挖掘,E-mail: 10173703@cumt.edu.cn; 赵江烁(1999-),男,河北保定人,本科生,研究方向:机器学习,数据挖掘,E-mail: 10173702@cumt.edu.cn。
  • 基金资助:
    中国矿业大学大学生创新训练计划项目(20190510)

Decision-making Method for Airport Taxi Drivers Based on Deep Reinforcement Learning

  1. (School of Mathematics, China University of Mining and Technology, Xuzhou 221100, China)
  • Received:2020-02-16 Online:2020-08-17 Published:2020-08-17

摘要: 针对以机场为代表的大型交通枢纽出租车调度困难的问题,从出租车司机利益的角度出发,提出一种基于改进深度强化学习的司机决策方法。该方法首先对机场环境和机场所在的城市环境进行模拟,定义了司机的状态、动作,与环境交互获得的奖励和状态转移。然后,以司机的状态参数作为DQN的输入,用DQN拟合状态-动作值函数(Q值函数)。最后,通过不断地让司机根据ε-贪心策略做出决策,并根据奖励函数达到更新DQN参数的目的。实验结果表明:在模拟的大、中、小型城市等环境下,司机都可以通过模型定量地得到当前各种决策动作的期望收益并作出合理的决策,从而自动地完成出租车调度的过程。

关键词: 出租车调度, 深度强化学习, DQN, Q值函数

Abstract: In order to deal with the difficulty of taxi dispatching in large transportation hub, especially in airport, from the view of the taxi driver’s profit, this paper proposes a decision-making method based on improved deep reinforcement learning. Firstly, the airport environment and the urban environment where the airport is located are simulated, and the driver’s states, actions, the rewards obtained from interaction with the environment and the state transitions are defined. Then, the states of the driver, as inputs, are fed into DQN to fit the values of Q-value function. Finally, through continuously simulating the drivers’ decisions by ε-greedy strategy and reward functions, this paper reaches the purpose of upgrading the parameters of DQN. The experiment results show that drivers can quantitatively get expected benefit for current decision actions and make proper decision through the model in simulated large, medium and small cities and other environments, so as to automatically complete the process of taxi dispatching.

Key words: taxi dispatching, deep reinforcement learning, DQN, Q-value function

中图分类号: