计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

基于HDF5格式的FAST望远镜数据重分布策略

  

  1. (1.贵州大学计算机科学与技术学院,贵州贵阳550025;2.中国科学院国家天文台,北京100101)
  • 收稿日期:2019-03-18 出版日期:2019-06-14 发布日期:2019-06-14
  • 作者简介:钟灵(1993-),男,贵州贵阳人,硕士研究生,研究方向:数据存储,并行计算,知识图谱,E-mail: 764940915@qq.com; 李晖(1982-),男,湖南衡阳人,教授,博士,研究方向:大规模数据管理与分析,E-mail: cse.HuiLi@gzu.edu.cn; 朱明(1966-),男,广西玉林人,研究员,博士,研究方向:射电天文学,E-mail: mz@nao.cas.cn。
  • 基金资助:
    国家自然科学基金资助项目(U1531246)

Data Redistribution Strategy of FAST Telescope Based on HDF5 Format

  1. (1. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China;
    2. National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China)
  • Received:2019-03-18 Online:2019-06-14 Published:2019-06-14

摘要: 目前,世界上最大的单口径射电望远镜FAST (Five-hundred-meter Aperture Spherical radio Telescope)处于调试期,采集的数据面临着加载效率上的瓶颈,给后续的数据处理效率带来了挑战。本文结合科学数据存储格式HDF5(Hierarchical Data Format Release 5)和数据降维,提出一种适合大多数FAST望远镜数据处理场景的存储优化方法。通过优化,硬盘中的数据从二进制表格模型转换为按类型分布的多个数据集。实验结果表明本文提出的方法能够显著提高FAST望远镜的数据加载效率。

关键词: FAST望远镜, HDF5, 数据重分布, 数据结构

Abstract: At present, FAST(Five-hundred-meter Aperture Spherical radio Telescope), the world’s largest single-caliber radio telescope, is in the period of pre-run. The collected data faces bottlenecks in loading efficiency, which brings challenges to subsequent efficiencies of data processing. Combined with scientific data storage format of HDF5(Hierarchical Data Format Release 5) and data reduction, this paper proposes a storage optimization method suitable for most data processing scenarios of FAST telescope. Through optimization, data in the hard disk is converted from a binary table model to multiple data sets distributed by type. The experimental results show that the proposed method can significantly improve the data loading efficiency of FAST telescope.

Key words: FAST telescope, HDF5, data redistribution, data structure

中图分类号: