计算机与现代化

• 应用与开发 • 上一篇    下一篇

基于Hive的水利普查数据仓库

  

  1. 河海大学计算机与信息学院,江苏南京211100
  • 收稿日期:2014-02-25 出版日期:2014-05-28 发布日期:2014-05-30
  • 作者简介:陈龙(1990-),男,江苏淮安人,河海大学计算机与信息学院硕士研究生,研究方向:数据挖掘; 万定生(1963-),男,江苏溧阳人,教授,研究方向:数据管理与数据挖掘; 顾昕辰(1989-),男,江苏南京人,硕士研究生,研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(51079040); 水利部948项目(201016)

Water Census Data Warehouse Based on Hive

  1. College of Computer and Information, Hohai University, Nanjing 211100, China
  • Received:2014-02-25 Online:2014-05-28 Published:2014-05-30

摘要: 针对水利普查数据海量、多维的特点,研究近年来在“大数据”概念下发展迅速的Hadoop与Hive,结合传统数据仓库在多维数据分析方面的成熟技术,提出基于Hive的水利普查数据仓库的构建方法,描述数据仓库系统的架构,并根据Hive的设计特点,通过分桶、消减维度表和冗余事实表的方法来改进传统的多维分析模型,最后搭建集群系统对水利普查数据集进行查询与分析测试。测试结果表明该数据仓库可以满足海量多维水利普查数据的存储与查询要求。

关键词: Hive, 数据仓库, 水利普查, 模型优化, 大规模数据处理

Abstract: For the characters that water census data is of large volumes and high dimension, studying Hadoop and Hive which have a quick development recently in the “big data” concept and combining mature technology in multidimensional data analysis using traditional data warehouse, this article proposes a construction method of water census data warehouse based on Hive. This paper describes the architecture of data warehouse system, improves multidimensional model by dimension table reduction, fact table redundancy and Hive’s bucket method, then carries on queries and analysis to water census data set on Hadoop cluster system. Experimental results show that the data warehouse meets the f storage and query requirements of massive multidimensional water census data.

Key words: Hive, data warehouse, water census, model optimization, large data processing

中图分类号: