面向海量水利数据的索引方法研究

doi:10.3969/j.issn.1006-2475.2017.10.007

计算机与现代化 ›› 2017, Vol. 0 ›› Issue (10): 29-35,41.doi: 10.3969/j.issn.1006-2475.2017.10.007

面向海量水利数据的索引方法研究

河海大学计算机与信息学院，江苏南京211100

收稿日期:2017-02-28 出版日期:2017-10-30 发布日期:2017-10-31
作者简介:冯钧（1969-），女，江苏常州人，河海大学计算机与信息学院教授，博士生导师，博士，CCF会员，研究方向：数据管理与知识工程，空间数据库理论与技术，水利信息化技术；徐维纲（1993-），男，浙江金华人，硕士研究生，研究方向：时空数据索引，分布式存储；冯读庆（1989-），男，江苏赣榆人，硕士研究生，研究方向：云存储与云计算；陆佳民（1983-），男，讲师，研究方向：移动对象数据管理，分布式数据处理。
基金资助:
国家自然科学基金资助项目(61370091,61602151)

Research on Index Method of Massive Hydrology Data

College of Computer and Information, Hohai University, Nanjing 211100, China

Received:2017-02-28 Online:2017-10-30 Published:2017-10-31

摘要/Abstract

摘要： 水利数据的存储形式多样、数据量庞大以及水利实体种类丰富，针对每一类水利实体对象，不仅存在基础描述信息，也存在一系列测量业务信息，这2类数据存储和更新频率不相同。水利业务检索不仅要求能实现对象基础信息的快速检索，还要求根据基础描述信息和业务信息之间的依赖进行组合查询，而目前云环境中，尚缺能满足此类兼顾多类型数据之间依赖关系的高效索引方法。此外，水利数据量的急剧增长，给系统检索性能带来了巨大的挑战。为此，本文提出基于Hadoop的分布式双层索引结构HRB，针对不同的数据类型建立不同的索引。经实验验证，HRB索引与常规分布式索引相比，索引创建效率更优，且在数据量达到千万级别时，HRB检索速度更快，表明HRB具有一定的使用价值。

关键词: 水利实体, 双层索引结构, 分布式索引, Hadoop

Abstract: A large amount of hydrology data are stored in different forms and there are rich varieties of hydrology entity classes. For every type of hydrology entities, some basic description information and series of measuring business data involved in these entities are stored in different way with different update frequency. Hydrology business retrieve requests the index to provide basic descriptive information searching and a kind of combined query based on the relation between basic descriptive information and the business information. However, there is not an efficient index method which can consider several kinds of data and their dependencies. Furthermore, the rapid increasing of hydrology data also brings big challenges to retrieval performance. So, this paper proposes a distributed two-level index HRB based on Hadoop, which creates different index to satisfy different data types and retrieve requirements. The Experiments show that HRB is better at creating index than traditional distributed index, and when the amount of data reaches 10 million levels, HRB index retrieve data is faster. So, HRB has definitive value.

Key words: hydrology entities, two-level index, distributed index, Hadoop

冯钧，徐维纲，冯读庆，陆佳民，徐欢. 面向海量水利数据的索引方法研究[J]. 计算机与现代化, 2017, 0(10): 29-35,41.

FENG Jun, XU Wei-gang, FENG Du-qing, LU Jia-min, XU Huan. Research on Index Method of Massive Hydrology Data[J]. Computer and Modernization, 2017, 0(10): 29-35,41.

参考文献

［1］ Howe D, Costanzo M, Fey P, et al. Big data: The future of biocuration［J］. Nature, 2008,455(7209):47-50.

［2］冯钧,许潇,唐志贤,等. 水利大数据及其资源化关键技术研究［J］. 水利信息化, 2013(4):6-9.

［3］蔡阳. 关于水利信息化资源整合共享的思考［J］. 水利信息化, 2014(6):1-6.

［4］程益联,付静. 水利数据整合共享研究［J］. 水利信息化, 2014(6):13-17.

［5］ Wu Kesheng, Otoo E J, Shoshani A. Optimizing bitmap indices with efficient compression［J］. ACM Transactions on Database Systems, 2006,31(1):1-38.

［6］ Zhang Xiangyu, Ai Jing, Wang Zhongyuan, et al. An efficient multi-dimensional index for cloud data management［C］// Proceedings of the 1st International Workshop on Cloud Data Management. 2009:17-24.

［7］ Bayer R, Mccreight E M. Organization and maintenance of large ordered indices［C］// Proceedings of the 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. 1970:107-141.

［8］ Zhang Chong, Xiao Weidong, Tang Daquan, et al. P2P-based multidimensional indexing methods: A survey［J］. Journal of Systems & Software, 2011,84(12):2348-2362.

［9］ Wang Jinbao, Wu Sai, Gao Hong, et al. Indexing multi- dimensional data in a cloud system［C］// Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010:591-602.

［10］王金宝. 云计算系统中索引与查询处理技术研究［D］. 哈尔滨:哈尔滨工业大学, 2013.

［11］Ding Linlin, Qiao Baiyou, Wang Guoren, et al. An efficient quad-tree based index structure for cloud data management［C］// Proceedings of the 12th International Conference on Web-Age Information Management. 2011:238-250.

［12］Lu Peng, Wu Sai, Shou Lidan, et al. An efficient and compact indexing scheme for large-scale data store［C］// Proceedings of 2013 IEEE the 29th International Conference on Data Engineering (ICDE). 2013:326-337.

［13］Wu Kesheng, Otoo E J, Shoshani A. Compressing bitmap indexes for faster search operations［C］// Proceedings of the 14th International Conference on Scientific and Statistical Database Management. 2002:99-108.

［14］Rinfret D, O’Neil P, O’Neil E. Bit-sliced index arithmetic［J］. ACM SIGMOD Record, 2001,30(2):47-57.

［15］Meng Biping, Wang Tengjiao, LI Hongyan, et al. Regional bitmap index: A secondary index for data management in could computing environment［J］. Chinese Journal of Computers, 2012,35(11):2306-2316.

［16］刘义,景宁,陈荦,等. MapReduce框架下基于R-树的k-近邻连接算法［J］. 软件学报, 2013,24(8):1836-1851.

［17］ Yin Yue, Yao Bin, Shen Yao, et al. A generic tree-like index framework in the cloud［C］// International Conference on Web Information Systems Engineering. 2013:330-342.

[1]	张军, 苏文浩 . 基于LZO的Hadoop文件归档优化方法[J]. 计算机与现代化, 2023, 0(06): 1-6.
[2]	苏林萍，安然，李为，崔文超，张晓良. 基于Hadoop的电力运维审计系统的设计[J]. 计算机与现代化, 2020, 0(01): 49-.
[3]	周天绮. 基于移动通信大数据的城市人口空间分布统计[J]. 计算机与现代化, 2018, 0(05): 45-.
[4]	温贺平1,鲍晶晶2,柯居鑫1,刘树威1. 基于Hadoop的Lorenz超混沌加密算法设计[J]. 计算机与现代化, 2018, 0(03): 108-.
[5]	朱珊，艾丽华. 基于Hadoop的大规模图像存储与检索[J]. 计算机与现代化, 2017, 0(6): 61-66+83.
[6]	李程，柴小丽，谢彬，唐鹏. 一种Hadoop YARN的资源调度机制[J]. 计算机与现代化, 2017, 0(11): 29-34.
[7]	张进，冯钧，陆佳民. 基于Hadoop的空间关键字索引方法[J]. 计算机与现代化, 2017, 0(11): 76-83.
[8]	李娜，陈正鸣，吕嘉，刘春芳. HDFS访问中间件的事务设计与实现[J]. 计算机与现代化, 2017, 0(1): 46-50.
[9]	孙立华1，胡牧1，孟庆强1，钱亚康1，王松2. 配网线损大数据高性能计算解决方案[J]. 计算机与现代化, 2016, 0(12): 42-46,50.
[10]	徐欢1,冯钧1,张鹏程1,唐志贤2,刘艺1,陈志飞1,张立霞1. 基于Hadoop的分布式CIF四叉树索引方法[J]. 计算机与现代化, 2016, 0(11): 12-19,24.
[11]	祁鹏年,朱晋,郝君慧,许丰平. 异构环境下Hadoop推测执行算法[J]. 计算机与现代化, 2015, 0(8): 80-83,88.
[12]	张建中1，黄艳飞2，熊拥军3. 基于ElasticSearch的数字图书馆检索系统[J]. 计算机与现代化, 2015, 0(6): 69-73.
[13]	胡龙,罗军. 基于MapReduce的混合连接算法[J]. 计算机与现代化, 2015, 0(6): 86-91.
[14]	王宾,刘钊远. 基于Rsync的远程文件同步优化模型[J]. 计算机与现代化, 2015, 0(4): 10-13.
[15]	张松，杜庆伟1，孙静2，孙振2. 基于预测的云计算热点数据副本因子决策算法[J]. 计算机与现代化, 2015, 0(2): 62-.

面向海量水利数据的索引方法研究

Research on Index Method of Massive Hydrology Data

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价