[1] 唐昊,张晓艳,韩江洪,等. 基于连续时间半马尔可夫决策过程的Option算法[J]. 计算机学报, 2014,37(9):2027-2035.
[2] 黄志成. 基于隐马尔可夫模型的学习行为评估[J]. 计算机应用与软件, 2014,31(6):59-62.
[3] 沈孝文. 分层强化学习与潜在动作模型的研究与应用[D]. 广州:华南理工大学, 2014.
[4] Konidaris G, Barto A. Efficient skill learning using abstraction selection[C]// Proceedings of the 21st International Joint Conference on Artifical Intelligence. 2009:1107-1112.[5] Rozo L, Jiménez P, Torras C. A robot learning from demonstration framework to perform force-based manipulation tasks[J]. Intelligent Service Robotics, 2013,6(1):33-51.
[6] 韩伟,鲁霜. 基于模糊推理的多智能体强化学习[J]. 计算机应用与软件, 2011,28(11):96-98.
[7] Prins N W, Sanchez J C, Prasad A. A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces[J]. Frontiers in Neuroscience, 2014,8:111.
[8] Jandhyala V, Fotopoulos S, Macneill I, et al. Inference for single and multiple change-points in time series[J]. Journal of Time Series Analysis, 2013,34(4):423-446.
[9] 杨志斌,胡凯,赵永望,等. 基于时间抽象状态机的AADL模型验证[J]. 软件学报, 2015(2):202-222.
[10]Kress-Gazit H, Pappas G J. Automatic synthesis of robot controllers for tasks with locative prepositions[C]// 2010 IEEE International Conference on Robotics and Automation. 2010:3215-3220.
[11]王作为,张汝波. 自主发育智能机器人体系结构研究[J]. 计算机应用与软件, 2011,28(11):36-39.
[12]Gupta K, Singh H P, Biswal B, et al. Adaptive targeting of chaotic response in periodically stimulated neural systems[J]. Chaos An Interdisciplinary Journal of Nonlinear Science, 2006,16(2):360-375.
[13]Xuan Xiang, Murphy K. Modeling changing dependency structure in multivariate time series[C]// Proceedings of the 24th International Conference on Machine Learning. 2007:1055-1062.
[14]Vien N A, Ertel W, Chung T C. Learning via human feedback in continuous state and action spaces[J]. Applied Intelligence, 2013,39(2):267-278.
[15]Boularias A, Chaib-Draa B. Apprenticeship learning with few examples[J]. Neurocomputing, 2013,104(3):83-96. |