一种基于示例轨迹的抽象动作树构造方法

doi:10.3969/j.issn.1006-2475.2016.06.018

计算机与现代化

一种基于示例轨迹的抽象动作树构造方法

苏州卫生职业技术学院,江苏苏州215009

收稿日期:2015-11-30 出版日期:2016-06-16 发布日期:2016-06-17
作者简介:王蕾(1980-),女,河南开封人,苏州卫生职业技术学院讲师,硕士,研究方向:机器学习。
基金资助:
国家自然科学基金资助项目(61373094)

An Abstract Action Tree Construction Algorithm Based on Demonstration Trajectories

Suzhou Health College, Suzhou 215009, China

Received:2015-11-30 Online:2016-06-16 Published:2016-06-17

摘要/Abstract

摘要： 自动构造抽象动作一直是分层强化学习研究中的关键技术之一。抽象动作链接算法是目前连续任务中自主发现抽象动作的典型算法，但是抽象动作链接算法需要进行很多次的迭代计算，收敛速度较慢。本文提出一种基于示例轨迹的抽象动作树构造算法(ACADT)，通过使用一种变点侦测方法，ACADT把每一个轨迹分割成一个抽象动作链。这些从轨迹中分割得到的抽象动作链随后被合并成一棵抽象动作树。实验表明ACADT可以构造成一棵抽象动作树并能够更快收敛。


关键词: 分层强化学习, 示例轨迹, 抽象动作, 自动构造, 机器学习

Abstract: Automatic construction of abstract action is one of the key technologies in hierarchical reinforcement learning. Skill chaining is a typical algorithm for automatically discovery abstract actions in continuous reinforcement learning domains, but the skill chaining algorithm needs to iterate many times and the convergence speed is slow. This paper presents an abstract action tree construction algorithm based on demonstration trajectories (ACADT). By using a change point detection method, ACADT segment each trajectory into a chain. The chains obtained from the multiple trajectories are merged into an abstract action tree. Experimental results show that ACADT can construct an abstract action tree and faster convergence.


Key words: hierarchical reinforcement learning, demonstration trajectories, abstract action, automatic construction, machine learning


中图分类号:

TP181

王蕾. 一种基于示例轨迹的抽象动作树构造方法[J]. 计算机与现代化, doi: 10.3969/j.issn.1006-2475.2016.06.018.

WANG Lei. An Abstract Action Tree Construction Algorithm Based on Demonstration Trajectories[J]. Computer and Modernization, doi: 10.3969/j.issn.1006-2475.2016.06.018.

参考文献

［1］唐昊,张晓艳,韩江洪,等. 基于连续时间半马尔可夫决策过程的Option算法［J］. 计算机学报, 2014,37(9):2027-2035.
［2］黄志成. 基于隐马尔可夫模型的学习行为评估［J］. 计算机应用与软件, 2014,31(6):59-62.
［3］沈孝文. 分层强化学习与潜在动作模型的研究与应用［D］. 广州:华南理工大学, 2014.
［4］ Konidaris G, Barto A. Efficient skill learning using abstraction selection［C］// Proceedings of the 21st International Joint Conference on Artifical Intelligence. 2009:1107-1112.［5］ Rozo L, Jiménez P, Torras C. A robot learning from demonstration framework to perform force-based manipulation tasks［J］. Intelligent Service Robotics, 2013,6(1):33-51.
［6］韩伟,鲁霜. 基于模糊推理的多智能体强化学习［J］. 计算机应用与软件, 2011,28(11):96-98.
［7］ Prins N W, Sanchez J C, Prasad A. A confidence metric for using neurobiological feedback in actor-critic reinforcement learning based brain-machine interfaces［J］. Frontiers in Neuroscience, 2014,8:111.
［8］ Jandhyala V, Fotopoulos S, Macneill I, et al. Inference for single and multiple change-points in time series［J］. Journal of Time Series Analysis, 2013,34(4):423-446.
［9］杨志斌,胡凯,赵永望,等. 基于时间抽象状态机的AADL模型验证［J］. 软件学报, 2015(2):202-222.
［10］Kress-Gazit H, Pappas G J. Automatic synthesis of robot controllers for tasks with locative prepositions［C］// 2010 IEEE International Conference on Robotics and Automation. 2010:3215-3220.
［11］王作为,张汝波. 自主发育智能机器人体系结构研究［J］. 计算机应用与软件, 2011,28(11):36-39.
［12］Gupta K, Singh H P, Biswal B, et al. Adaptive targeting of chaotic response in periodically stimulated neural systems［J］. Chaos An Interdisciplinary Journal of Nonlinear Science, 2006,16(2):360-375.
［13］Xuan Xiang, Murphy K. Modeling changing dependency structure in multivariate time series［C］// Proceedings of the 24th International Conference on Machine Learning. 2007:1055-1062.
［14］Vien N A, Ertel W, Chung T C. Learning via human feedback in continuous state and action spaces［J］. Applied Intelligence, 2013,39(2):267-278.
［15］Boularias A, Chaib-Draa B. Apprenticeship learning with few examples［J］. Neurocomputing, 2013,104(3):83-96.

[1]	贾潇瑶, . 融合CatBoost和SHAP的乳腺癌预测及特征分析[J]. 计算机与现代化, 2023, 0(10): 32-38.
[2]	张芸, 白开峰, 王星, 仓甜, 周通, 段锦文, 苏晗. 智能电网环境下窃电行为检测[J]. 计算机与现代化, 2023, 0(03): 60-65.
[3]	石志伟, 武志峰, 张哲. 纠正学习策略下LightGBM-GRU模型的股票波动率预测[J]. 计算机与现代化, 2023, 0(01): 95-102.
[4]	关云鹏, 刘玉龙. 基于从共现矩阵提取关联的类别型数据聚类[J]. 计算机与现代化, 2022, 0(11): 1-8.
[5]	冷涛, . 基于深度学习的加密流量分类研究综述[J]. 计算机与现代化, 2021, 0(08): 112-120.
[6]	邓子云, . 一种为辅助诊断筛选机器学习模型的方法[J]. 计算机与现代化, 2021, 0(03): 88-93.
[7]	郭欣, 陈瑛, 章鸣嬛, 张璇, 潘曙明, 汤璐佳. 利用机器学习方法对灾难生命支持课程NDLS培训效果进行分析预测#br#[J]. 计算机与现代化, 2020, 0(12): 61-66.
[8]	陈平平,耿笑冉,邹敏,谭定英. 基于机器学习的文本情感倾向性分析[J]. 计算机与现代化, 2020, 0(03): 77-.
[9]	马吉科,尹飞,祝永晋,豆龙龙,李剑. 一种应用半监督学习的计量装置运行状态辨识方法[J]. 计算机与现代化, 2020, 0(03): 82-.
[10]	赵琦1,2,蒋朝惠1,2,周雪梅1,2,宋紫华1,2. 一种基于HTTP协议的隐蔽隧道及其检测方法[J]. 计算机与现代化, 2019, 0(06): 16-.
[11]	孙小川,芦天亮. 基于聚类的数据加权优化在犯罪预测中的应用[J]. 计算机与现代化, 2019, 0(06): 55-.
[12]	刘彬，张冀聪. 运动相关电位分类算法比较和语义范式分析[J]. 计算机与现代化, 2018, 0(11): 88-.
[13]	梁东，杨永全，魏志强. 基于支持向量机的网页正文内容提取方法[J]. 计算机与现代化, 2018, 0(09): 21-.
[14]	白君泽,杨红丽,张标. Android应用程序权限组重要性分析[J]. 计算机与现代化, 2018, 0(08): 102-.
[15]	李鹏鹏,范会敏. 文本分类中特征权重算法改进研究[J]. 计算机与现代化, 2018, 0(02): 66-.

一种基于示例轨迹的抽象动作树构造方法

An Abstract Action Tree Construction Algorithm Based on Demonstration Trajectories

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价