计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

基于曼哈顿距离的不确定移动对象概率Skyline查询

  

  1. 南京航空航天大学计算机科学与技术学院,江苏南京211106
  • 收稿日期:2017-02-19 出版日期:2017-10-30 发布日期:2017-10-31
  • 作者简介:李金阳(1983-),男,江苏铜山人,南京航空航天大学计算机科学与技术学院硕士研究生,研究方向:数据管理与知识工程; 陈嘉良(1992-),男,广东东莞人,硕士研究生,研究方向:连续Skyline查询。

Probabilistic Skyline Queries over Uncertain Moving Objects Based on Manhattan Distance

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Received:2017-02-19 Online:2017-10-30 Published:2017-10-31

摘要: 在众多应用中,由于受到测量仪器精度、更新延迟、网络带宽等限制,不同形式的数据不确定性广泛存在。目前,不确定数据中的信息查询受到数据库研究领域学者的关注,并且为不确定数据寻找高效的分析方法也成为了一个热门课题。本文针对基于曼哈顿距离的不确定移动对象概率Skyline查询问题,提出一个基于曼哈顿距离的概率Skyline模型用于求解不确定移动对象在某时刻是Skyline的概率,并得到一个p-t-Skyline结果集,此集合包含所有在t时刻Skyline概率至少是p的移动对象。在实际应用中,计算大量不确定移动对象的Skyline概率过程繁琐,代价高昂。为提高概率Skyline查询过程的计算效率,本文提出包含“采样-限定-修剪-精炼”4个步骤的解决方案。同时,为进一步减少Skyline运算开销,本文使用一个多维索引结构VCI树以加快数据检索的效率。实验结果表明该解决方案在不同数据规模以及维度的数据集上均具有较高的效率。

关键词: 曼哈顿距离, 移动计算, 概率Skyline查询, 不确定数据

Abstract: In many applications, due to the measurement instrument accuracy, update delay, network bandwidth and other restrictions, different forms of data uncertainty is widespread. At present, the information query in the uncertain data has been paid attention to by researchers in the field of database research, and it is also a hot topic to find efficient analysis method for uncertain data. This paper focuses on the uncertain moving object probability Skyline query problem based on Manhattan distance, and proposes a probability Skyline model based on Manhattan distance. Skyline model is used to solve the uncertainty of moving object at some point and obtain a p-t-Skyline result set. The result set contains all the moving objects whose Skyline probability is at least p at time t. In practical applications, the Skyline probability process for calculating a large number of uncertain moving objects is cumbersome and costly. In order to improve the computational efficiency of the probability Skyline query process, this paper presents a solution that includes four steps: sampling, bounding, pruning and refining. At the same time, a multidimensional index structure VCI tree is utilized to speed up the efficiency of data retrieval in order to further reduce Skyline computational overhead. The experimental results show that the solution has high efficiency on data sets of different scales and dimensions.

Key words: Manhattan distance, mobile computing, probabilistic Skyline query, uncertain data