面向信息网模型的动态数据划分算法

摘要/Abstract

摘要： 针对分布式信息网数据库管理系统中因跨节点的复杂查询带来的昂贵通信开销，提出一种基于信息网模型和查询的数据动态划分算法。该算法根据信息网模型的关系特性和历史关系信息得到数据之间的初始关联，并结合历史查询信息挖掘数据之间的潜在关联，将关联性较强的数据动态调整到同一个处理节点上，使复杂查询跨节点的数量减少。最后，在标准合成数据集WatDiv上进行大量的实验评估。实验结果表明：在保证节点之间的对象个数和关系对占比负载均衡的情况下，该算法在周期内的查询时间与一致性哈希算法相比缩短了35%~55%，并将多个周期相同查询的时间波动控制在5%~10%，保证了复杂查询的稳定性。

关键词: 信息网模型, 动态数据划分, 关联性, 负载均衡, 分布式系统

Abstract: Due to the high communication overhead caused by the complex query across nodes in the distributed information network model (INM) database management system, a dynamic data partition and query processing algorithm is proposed. Based on the relationship characteristics of INM model and the historical relationship information, it obtains the initial relevance between data, then mines the potential relevance between data based on the historical query information and dynamically adjusts the data with strong correlation to the same processing node, so as to reduce the number of cross-nodes traversals in complex query. The extensive experiments on synthetic dataset WatDiv are carried out. The experimental results show that the query time of this algorithm is reduced by 35%~55% compared with the consistent hash algorithm in the period by ensuring the load balance of the number of objects and the proportion of relationship pairs between nodes, and the time fluctuation of the same query in multiple periods is controlled within 5%~10%, which ensures the stability of complex queries.

Key words: information network model, dynamic data partition, relevance, load balancing, distributed system

袁嘉立, 刘梦赤. 面向信息网模型的动态数据划分算法[J]. 计算机与现代化, 2022, 0(10): 100-105.

YUAN Jia-li, LIU Meng-chi. Dynamic Data Partition Algorithm for Information Network Model[J]. Computer and Modernization, 2022, 0(10): 100-105.

参考文献

［1］〗DAVOUDIAN A, LIU M C. Big data systems: A software engineering perspective［J］. ACM Computing Surveys, 2020,53(5). DOI: 10.1145/3408314.
［2］ SOKOLINSKY L B. Survey of architectures of parallel database systems［J］. Programming and Computer Software, 2004,30(6):337-346.
［3］ ABADI D, AILAMAKI A, ANDERSEN D, et al. The Seattle report on database research［J］. ACM SIGMOD Record, 2020,48(4):44-53.
［4］ ARRUDA D. Requirements engineering in the context of big data applications［J］. ACM SIGSOFT Software Engineering Notes, 2018,43(1). DOI: 10.1145/3178315.3178323.
［5］ DAVOUDIAN A, CHEN L, LIU M C. A survey on NoSQL stores［J］. ACM Computing Surveys, 2018,51(2). DOI: 10.1145/3158661.
［6］胡婕,刘梦赤. 信息网模型INM研究［M］. 北京:科学出版社, 2011.〖HJ0.44mm〗
［7］徐倩,胡婕,刘梦赤. 复杂语义关系的描述与操作［J］. 计算机科学与探索, 2014,8(12):1432-1441.
［8］马杨,刘梦赤. 分布式信息网数据库管理系统的动态数据划分研究［J］. 计算机工程, 2017,43(9):34-38.
［9］杨小虎,王新宇,毛明. 基于数据划分的分布式模型及其负载均衡算法［J］. 浙江大学学报(工学版), 2008,42(4):602-607.
［10］KARYPIS G, KUMAR V. Multilevel graph partitioning schemes［C］// Proceedings of the 1995 International Conference on Parallel Processing. 1995:113-122.
［11］ARORA A, KAUR K. Enhanced multilevel hybrid algorithm for graph partitioning［J］. International Journal of Computer Applications, 2015,120(6):16-19.
［12］MOHAMMAD S, BRE S, SCHALLEHN E. Cloud data management: A short overview and comparison of current approaches［C］// Proceedings of the 2012 24th GI-Workshop on Foundations of Databases. 2012:41-46.
［13］CHEN C, TSAI K C. The server reassignment problem for load balancing in structured P2P systems［J］. IEEE Transactions on Parallel and Distributed Systems, 2008,19(2):234-246.
［14］HAO F, YAU S S, MIN G Y, et al. Detecting k-balanced trusted cliques in signed social networks［J］. IEEE Internet Computing, 2014,18(2):24-31.
［15］ANDREEV K, RACKE H. Balanced graph partitioning［J］. Theory of Computing Systems, 2006,39(6):929-939.
［16］GUTTMANN-BECK N, HASSIN R. Approximation algorithms for minimum k-cut［J］. Algorithmica, 2000,27(2):198-207.
［17］NAGAMOCHI H, IBARAKI T. A fast algorithm for computing minimum 3-way and 4-way cuts［J］. Mathematical Programming, 2000,88(3):507-520.
［18］GUSFIELD D, TARDOS . A faster parametric minimum-cut algorithm［J］. Algorithmica, 1994,11(3):278-290.
［19］FIRTH H, MISSIER P. TAPER: Query-aware, partition-enhancement for large, heterogenous graphs［J］. Distributed and Parallel Databases, 2017,35(2):85-115.
［20］TURK A, SELVITOPI R O, FERHATOSMANOGLU H, et al. Temporal workload-aware replicated partitioning for social networks［J］. IEEE Transactions on Knowledge and Data Engineering, 2014,26(11):2832-2845.
［21］PENG P, ZOU L, CHEN L, et al. Adaptive distributed RDF graph fragmentation and allocation based on query workload［J］. IEEE Transactions on Knowledge and Data Engineering, 2019,31(4):670-685.
［22］DAVOUDIAN A, CHEN L, TU H W, et al. A workload-adaptive streaming partitioner for distributed graph stores［J］. Data Science and Engineering, 2021,6(2):163-179.
［23］XU N, CHEN L, CUI B. LogGP: A log-based dynamic graph partitioning method［J］. Proceedings of the VLDB Endowment, 2014,7(14):1917-1928.
［24］CODD E F. A relational model of data for large shared data banks［J］. Communications of the ACM, 1970,13(6):377-387.
［25］BERLER M, EASTMAN J, JORDAN D, et al. The Object Data Standard: ODMG 3.0［M］. Morgan Kaufmann, 2000.
［26］GOTTLOB G, SCHREFL M, ROCK B. Extending object-oriented systems with roles［J］. ACM Transactions on Information Systems, 1996,14(3):268-296.
［27］LASSILA O, SWICK R R. Resource Description Framework (RDF) Model and Syntax Specification［S］. World Wide Web Consortium, 1998.
［28］GAO L B, GOLAB L, OZSU M T, et al. Stream WatDiv: A streaming RDF benchmark［C］// Proceedings of the 2018 International Workshop on Semantic Big Data. 2018. DOI: 10.1145/3208352.3208355.

[1]	冯冼1, 2, 方昆1, 屈右铭1, 刘晓波1, 施佳驰1, 文立恒1. 气象服务中台关键技术研究与应用[J]. 计算机与现代化, 2024, 0(05): 69-74.
[2]	陈超, 顾青峰. 面向混合负载的分布式气象数据管理系统设计[J]. 计算机与现代化, 2023, 0(12): 118-122.
[3]	王重阳, 庄毅. 基于SDN和改进CSA算法的多作业集群的负载均衡算法[J]. 计算机与现代化, 2023, 0(11): 28-35.
[4]	管金平, 杨晋吉, 杨成龙. 基于概率模型的Raft协议形式化验证[J]. 计算机与现代化, 2023, 0(09): 77-81.
[5]	倪雅婷, 杨文晖, 苗放, 黄安琪, 蒋媛. 基于Nginx的DRC集群动态负载均衡策略[J]. 计算机与现代化, 2022, 0(04): 58-64.
[6]	段鹏飞, 兰茹. 基于区块链的网络级移动目标防御系统设计[J]. 计算机与现代化, 2021, 0(08): 121-126.
[7]	李娟. 基于物联网技术的异构集群动态负载均衡算法[J]. 计算机与现代化, 2021, 0(04): 104-108.
[8]	杨牧川, 吕晓丹, 蒋朝惠, . 云计算环境下的可修分布式系统可靠性分析方法[J]. 计算机与现代化, 2020, 0(06): 28-.
[9]	彦逸,周开东,林细君,麦晓辉,肖建毅,曾朝霖. 基于因果规则的电力营销系统故障定位算法[J]. 计算机与现代化, 2020, 0(03): 13-.
[10]	王骞，闫夏莉，叶崛宇，张海阔，李真辉 . 面向权威DNS的数据一致性保障机制[J]. 计算机与现代化, 2020, 0(02): 36-.
[11]	卫津逸，徐珞，戴文博. 基于脆弱点分析的故障注入技术[J]. 计算机与现代化, 2019, 0(12): 39-.
[12]	郑策1,2,尤佳莉1,2. 电影推荐系统中基于图的协同过滤算法[J]. 计算机与现代化, 2019, 0(11): 38-.
[13]	文婷婷,李洪赭. 面向云服务平台的弹性负载均衡算法[J]. 计算机与现代化, 2019, 0(10): 28-.
[14]	屠雪真. 一种优化的内核态文件发送方法[J]. 计算机与现代化, 2019, 0(05): 13-.
[15]	李航，臧洌，甘露. 负载均衡技术在并行化符号执行中的应用[J]. 计算机与现代化, 2018, 0(07): 86-.