计算机与现代化 ›› 2021, Vol. 0 ›› Issue (05): 44-50.

• 算法设计与分析 • 上一篇    下一篇

基于Matrix Profile的时间序列变长模体挖掘

  

  1. (河海大学计算机与信息学院,江苏南京211100)
  • 出版日期:2021-06-03 发布日期:2021-06-03
  • 作者简介:朱旭(1995—),男,江苏南通人,硕士研究生,研究方向:数据挖掘,E-mail: 1277213194@qq.com; 朱晓晓(1994—),女,硕士研究生,研究方向:数据挖掘,E-mail: 1099084032@qq.com; 王继民(1976—), 男,副教授,硕士,研究方向:智能水信息处理,数据挖掘,E-mail: wangjimin@hhu.edu.cn。
  • 基金资助:
    国家重点研发计划项目(2018YFC1508106)

Variable-length Motif Mining of Time Series Based on Matrix Profile

  1. (College of Computer and Information, Hohai University, Nanjing 211100, China)
  • Online:2021-06-03 Published:2021-06-03

摘要: 已有的变长模体发现算法存在速度慢、可扩展性较差,且结果中包含过短、过长和平凡匹配等无意义模体的问题。本文提出一种基于Matrix Profile的时间序列变长模体挖掘算法。该算法使用STOMP算法作为子程序,使用结合了增量计算的下界距离来加速候选模体提取过程;采用长度相似性条件和模体分组等价类方法踢除过短、过长和平凡匹配等无意义的模体。在数据集UCR上的实验表明,提出的算法在发现变长模体时,能够有效地过滤无意义模体,且具有较高的效率和准确率。

关键词: 时间序列, 模体挖掘, Matrix Profile, 变长模体

Abstract: Existing variable-length motif discovery algorithms have the problems that its speed is slow, its scalability is poor, and its results include meaningless phantoms such as too short, too long, or ordinary matching. A time series variable-length motif mining algorithm based on Matrix Profile is proposed. The algorithm uses the STOMP algorithm as a subroutine, and uses the lower bound distance combined with the incremental calculation to accelerate the process of extracting candidate motifs. The length similarity condition and the equivalent class method of motif group are used to remove the meaningless motifs that are too short, too long, or trival matched. Experiments on the dataset UCR show that proposed algorithm can effectively filter the meaningless motifs when the variable-length motifs are found, and has high efficiency and accuracy.

Key words: time series, motif mining, Matrix Profile, variable-length motif