基于K-means算法的轨迹数据热点挖掘算法

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (10): 23-28.

基于K-means算法的轨迹数据热点挖掘算法

(1.青岛科技大学信息科学技术学院,山东青岛266061;2.中国海洋大学信息科学与工程学院,山东青岛266100;
3.温州大学计算机与人工智能学院,浙江温州325000)

出版日期:2021-10-14 发布日期:2021-10-14
作者简介:徐文进(1977—),男,山东青岛人,副教授，硕士生导师，博士,研究方向:轨迹预测,数据挖掘,E-mail: wenjin@qust.edu.cn; 通信作者：管克航(1994—),男,山东聊城人，硕士研究生,研究方向:智能信息处理,E-mail: 1812176563@qq.com。
基金资助:
山东省重点研发计划项目(2018GGX105005); 浙江省基础公益研究计划项目(LGN20F020001)

Track Data Hot Spot Mining Algorithm Based on K-means

(1. College of Information Science and Technology, Qingdao University of Science & Technology, Qingdao 266061, China;
2. College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China;
3. College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325000, China)

Online:2021-10-14 Published:2021-10-14

摘要/Abstract

摘要： 针对渔船轨迹数据具有时间序列性、数量大的特点，提出一种轨迹热点挖掘算法。该算法克服了K-means算法在渔船轨迹数据上无法捕捉热点分布的缺点。其主要的思想是：首先使用时间维度来处理数据，以置信度和KL散度作为衡量所选取数据的可靠性、正确性依据，从大量的轨迹数据中选取信息含量较高的数据，然后使用K-means聚类算法进行数据的聚类。本文所提出的算法只需要设定显著水平参数a和时间间隔T，算法本身就可通过时间维度处理数据的方法自主完成数据的选择以及置信度、KL散度的计算，并引入聚类有效性度量的方法，使K-means通过自我寻找K值来实现热点挖掘的整个过程。在渔船轨迹数据上进行本文算法与K-means算法的对比实验和数据热力图的参照实验，结果显示本文所提的算法在寻找轨迹数据热点上有优越性和正确性。

关键词: 显著水平a, KL散度, 时间维度, 聚类有效性度量, 轨迹热点

Abstract: In view of the characteristics of time series and large quantity of fishing boat trajectory data, this paper proposes a trajectory hot spot mining algorithm, which overcomes the disadvantage that K-means algorithm cannot capture hot spot distribution in fishing boat trajectory data. The main idea is as follows: firstly, time dimension is used to process the data, and based on confidence and KL divergence to measure the reliability and correctness of the selected data, data with high information content is selected from a large number of trajectory data, and then the K-means clustering algorithm is used to cluster the processed data. The algorithm proposed in this paper only needs to set the significant level parameter a and time interval T, the algorithm itself can independently complete the data selection and the calculation of the confidence, KL divergence by using the method of time dimension data processing, and the clustering validity measure method is introduced to realize the whole process of hot spot mining by self-searching K value of K-means. The comparison test between the proposed algorithm and K-means algorithm and the reference test of data heat map are carried out on the trajectory data of fishing boats. The results show that the proposed algorithm is superior and correct in finding hot spots of trajectory data.

Key words: significant level a, KL divergence, time dimension, cluster validity measurement, track hot

徐文进, 管克航, 马越, 黄海广. 基于K-means算法的轨迹数据热点挖掘算法[J]. 计算机与现代化, 2021, 0(10): 23-28.

XU Wen-jin, GUAN Ke-hang, MA Yue, HUANG Hai-guang. Track Data Hot Spot Mining Algorithm Based on K-means[J]. Computer and Modernization, 2021, 0(10): 23-28.

参考文献

［1］王艳军,王晓峰. AIS和北斗终端组合在船舶动态监控中的应用［J］. 上海海事大学学报, 2011,32(4):17-21.
［2］郭飚,薛元宏. 北斗系统在海洋渔业信息化建设中的关键技术与实现途径［J］. 现代渔业信息, 2004,19(5):13-14.
［3］黄海广,胡乃军,仇志金,等. 基于多源定位数据的实时船舶监控系统设计［J］. 中国海洋大学学报(自然科学版), 2015,45(9):122-129.
［4］魏德志,陈福集,林丽娜. 一种基于时间序列的热点话题发现模型和算法［J］. 情报科学, 2017,35(10):142-146.
［5］周博,马林兵,胡继华,等. 基于轨迹数据场的热点区域提取及空间交互分析——以深圳市为例［J］. 热带地理, 2019,39(1):117-124.
［6］周勍. 基于时空数据场与复杂网络的城市热点提取及动态演化研究［D］. 武汉:武汉大学, 2017.
［7］傅德胜,周辰. 基于密度的改进K均值算法及实现［J］. 计算机应用, 2011,31(2):432-434.
［8］牛琨,张舒博,陈俊亮. 融合网格密度的聚类中心初始化方案［J］. 北京邮电大学学报, 2007,30(2):6-10.
［9］徐文进,管克航,寻晴晴,等. 基于KNN算法的改进K-means算法［J］. 青岛科技大学学报(自然科学版), 2019,40(5):107-111.

［10］戴月明,王明慧,张明,等. SVD优化初始簇中心的K-means中文文本聚类算法［J］. 系统仿真学报, 2018,30(10):3835-3842.

［11］王盛慧,夏永丰. 基于搜寻者优化算法的K-means聚类算法［J］. 燕山大学学报, 2018,42(5):422-426.
［12］曹永春,蔡正琦,邵亚斌. 基于K-means的改进人工蜂群聚类算法［J］. 计算机应用, 2014,34(1):204-207.
［13］王千,王成,冯振元,等. K-means聚类算法研究综述［J］. 电子设计工程, 2012,20(7):21-24.
［14］杨善林,李永森,胡笑旋,等. K-means算法中的K值优化问题研究［J］. 系统工程理论与实践, 2006,26(2):97-101.
［15］吴夙慧,成颖,郑彦宁,等. K-means算法研究综述［J］. 现代图书情报技术, 2011,27(5):28-35.
［16］周爱武,于亚飞. K-means聚类算法的研究［J］. 计算机技术与发展, 2011,21(2):62-65.
［17］刘靖明,韩丽川,侯立文. 基于粒子群的K均值聚类算法［J］. 系统工程理论与实践, 2005,25(6):54-58.
［18］冯超. K-means聚类算法的研究［D］. 大连:大连理工大学, 2007.
［19］周世兵,徐振源,唐旭清. K-means算法最佳聚类数确定方法［J］. 计算机应用, 2010,30(8):1995-1998.
［20］韩凌波,王强,蒋正锋,等. 一种改进的K-means初始聚类中心选取算法［J］.计算机工程与应用, 2010,46(17):150-152.
［21］吴晓蓉. K-均值聚类算法初始中心选取相关问题的研究［D］. 长沙:湖南大学, 2008.
［22］张文君,顾行发,陈良富,等. 基于均值-标准差的K均值初始聚类中心选取算法［J］. 遥感学报, 2006,10(5):715-721.
［23］高国琴,李明. 基于K-means算法的温室移动机器人导航路径识别［J］. 农业工程学报, 2014,30(7):25-33.