Large-scale Image Storage and Retrieval Based on Hadoop

doi:10.3969/j.issn.1006-2475.2017.06.012

Abstract

Abstract: The exponential growth of images makes the traditional single machine image retrieval face the problem of slow retrieval speed, poor concurrency and low image accuracy when dealing with large-scale images. According to that the image feature files are small files, this paper proposed to properly merge the small files, and then put store them on thedistributed file system HDFS of Hadoop. It achieved rapidly store and read massive image data. In order to adapt to the large-scale image retrieval, this paper proposed to binarize Fisher vector of images and use MapReduce parallel programming model to realize parallel image retrieval based on binary Fisher vector and SIFT. Experiments on INRIA Holidays dataset, Kentucky dataset and Flicker1M dataset show that the method is scalable, can achieve better retrieval accuracy, effectively reduce the retrieval time and improve the retrieval speed. It is a highly efficient large-scale image storage and retrieval method.

Key words: large-scale images, Hadoop, parallel retrieval, binary Fisher vector, SIFT

CLC Number:

TP391

ZHU Shan, AI Li-hua. Large-scale Image Storage and Retrieval Based on Hadoop[J]. Computer and Modernization, doi: 10.3969/j.issn.1006-2475.2017.06.012.

References

[1] Ponomarev A, Nalamwar H S, Babakov I, et al. Content-based image retrieval using color, texture and shape features[J]. Key Engineering Materials, 2016,685:872-876.

[2] Ledwich L, Williams S. Reduced SIFT features for image retrieval and indoor localisation[C]// Proceedings of the 2004 Australian Conference on Robotics and Automation. 2004.

[3] Jegou H, Douze M, Schmid C. Packing bag-of-features[C]// Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. 2009.

[4] Perronnin F, Liu Yan, Sánchez J, et al. Large-scale image retrieval with compressed Fisher vectors[C]// Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2010:3384-3391.

[5] Jai-Andaloussi S, Elabdouli A, Chaffai A, et al. Medical content based image retrieval by using the Hadoop framework[C]// Proceedings of the 2013 International Conference on Telecommunications. 2013.

[6] 郭飞,詹炳宏,刘刚. 基于Hadoop的服饰图像存储与检索关键技术研究[J]. 计算机应用研究, 2014,31(4):1086-1089.

[7] Apache. HDFS[EB/OL]. http://wiki.apache.org/hadoop/ProjectDescription, 2009-07-10.

[8] Apache. Map/Reduce[EB/OL]. http://wiki.apache.org/hadoop/Hadoop-MapReduce, 2009-07-10.

[9] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008,51(1):107-113.

[10] Lowe D G. Object recognition from local scale-invariant features[C]// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157.

[11] Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008,110(3):346-359.

[12] 林杰. 面向移动视觉搜索的紧凑聚合描述子研究[D]. 北京:北京交通大学, 2014.

[13] Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization[C]// Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007.

[14] Jaakkola T S, Haussler D. Exploiting generative models in discriminative classifiers[C]// Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems Ⅱ. 1999:487-493.

[15] Jegou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search[C]// Proceedings of the 10th European Conference on Computer Vision: Part I. 2008:304-317.

[16] Nister D, Stewenius H. Scalable recognition with a vocabulary tree[C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2006,2:2161-2168.