Computer and Modernization

Previous Articles     Next Articles

Large-scale Image Storage and Retrieval Based on Hadoop

  

  1. (School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
  • Received:2016-10-22 Online:2017-06-23 Published:2017-06-23

Abstract: The exponential growth of images makes the traditional single machine image retrieval face the problem of slow retrieval speed, poor concurrency and low image accuracy when dealing with large-scale images. According to that the image feature files are small files, this paper proposed to properly merge the small files, and then put store them on thedistributed file system HDFS of Hadoop. It achieved rapidly store and read massive image data. In order to adapt to the large-scale image retrieval, this paper proposed to binarize Fisher vector of images and use MapReduce parallel programming model to realize parallel image retrieval based on binary Fisher vector and SIFT. Experiments on INRIA Holidays dataset, Kentucky dataset and Flicker1M dataset show that the method is scalable, can achieve better retrieval accuracy, effectively reduce the retrieval time and improve the retrieval speed. It is a highly efficient large-scale image storage and retrieval method.

Key words: large-scale images, Hadoop, parallel retrieval, binary Fisher vector, SIFT

CLC Number: