Computer and Modernization ›› 2023, Vol. 0 ›› Issue (06): 1-6.doi: 10.3969/j.issn.1006-2475.2023.06.001

• DESIGN AND ANALYSIS OF ALGORITHM •     Next Articles

Optimization Method of Hadoop File Archiving Based on LZO

ZHANG Jun1, SU Wen-hao2   

  1. 1. School of Software, East China University of Technology, Nanchang 330013, China;
    2. School of Information Engineering, East China University of Technology, Nanchang 330013, China
  • Received:2022-11-09 Revised:2023-01-19 Online:2023-06-28 Published:2023-06-28

Abstract: The distributed framework Hadoop is widely used in various fields of big data processing. However, more metadata information will be generated while a large number of small files are stored in Hadoop, which can lead to excessive usage of memory in NameNode and affect its ability to provide high performance and high concurrent access. Archiving and storing small files is an effective solution to this problem. At the same time, as data compression can effectively reduce the size of data storage space and network data transmission load, this paper proposes a Hadoop file archiving optimization method named LA (LZO-Archive)based on a real-time lossless compression algorithm LZO. In order to reduce the time of generating index files, LA incorporates LZO compression algorithm during the process of the index file generation stage on the basis of archiving and merging small files. Moreover, a file compression storage algorithm is designed in LA to compress and store data files and index files, which can effectively reduce the occupied disk space in DataNode and the occupied memory space in NameNode. This paper also elaborates the design and implementation of experimental method for LA. Experimental results show that compared with the original HDFS data storage method, the benchmark method of file archiving HAR and the comparison method LHF, the proposed method LA performs better in the aspects of file archiving time, memory usage in NameNode, disk space usage in DataNode, and file access time.

Key words: Hadoop, HDFS, small files, archiving storage, LZO algorithm

CLC Number: