计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

海量自动气象站分钟数据存储检索系统设计

  

  1. 安徽省气象信息中心,安徽  合肥  230031
  • 收稿日期:2017-04-29 出版日期:2017-09-20 发布日期:2017-09-19
  • 作者简介:王建荣(1981-),男,江苏扬州人,安徽省气象信息中心工程师,硕士,研究方向:分布式计算,数据库系统设计; 季刚(1979-),男,安徽合肥人,高级工程师,硕士,研究方向:气象信息系统设计。
  • 基金资助:
    中国气象局关键技术集成项目(CMAGJ2015M29); 安徽省气象局科技发展基金资助项目(KM201604)

Design of Storage and Retrieval System for Mass Automatic Weather Station Minute Data

  1. Anhui Meteorological Information Center, Hefei 230031, China
  • Received:2017-04-29 Online:2017-09-20 Published:2017-09-19

摘要: 地面自动气象站的时空密度不断增加,产生的观测数据量呈指数级增长,传统的关系型数据库在海量数据存储与检索方面存在能力不足、检索性能下降等问题。鉴于此,本文设计一种自动站分钟数据存储与检索系统。使用Quartz定时采集自动站分钟文件并解码入库;应用HBase分布式数据库建立分钟数据存储模型;针对多要素查询需求,应用Elasticsearch建立辅助索引,实现HBase的二级索引。系统测试结果表明,分钟数据入库平均耗时54.6 s,二级索引完整可靠,数据检索结果返回时效达到毫秒级,能够满足业务应用中对自动站分钟数据存储和检索时效的要求。

关键词: 自动站分钟数据, Kafka, HBase, Elasticsearch, 协处理器

Abstract:  With the rapid growth of temporal and spatial density of ground automatic weather station, the quantity of observed data increases exponentially, traditional relational database has insufficient ability and low performance in mass data storage and retrieval. In view of this, an automatic station minute data storage and retrieval system is designed. Quartz is used to collect automatic station minute files and then the system decodes the minute files and storage decoded data into database. The system uses HBase to build minute data storage model. In order to meet the query demands of multiple elements, the system applies Elasticsearch to build secondary index of HBase. Test results show that the average storage time for minute data is 54.6 s and the secondary index with Elasticsearch is complete and reliable and the retrieval results returned time to milliseconds. Therefore, the system can satisfy the demand for storage and query performance of automatic weather station minute data.

Key words: AWS minute data, Kafka, HBase, Elasticsearch, coprocessor

中图分类号: