基于Hadoop的大规模图像存储与检索

doi:10.3969/j.issn.1006-2475.2017.06.012

计算机与现代化 ›› 2017, Vol. 0 ›› Issue (6): 61-66+83.doi: 10.3969/j.issn.1006-2475.2017.06.012

基于Hadoop的大规模图像存储与检索

(北京交通大学计算机与信息技术学院，北京 100044)

收稿日期:2016-10-22 出版日期:2017-06-23 发布日期:2017-06-23
作者简介:朱珊(1992-)，女，安徽铜陵人，北京交通大学计算机与信息技术学院硕士研究生，研究方向：分布式与并行化; 艾丽华(1964-)，女，副教授，硕士生导师，研究方向：分布式与并行化，云计算。
基金资助:
北京市自然科学基金资助项目(4152042); 中央高校基本科研业务费专项资金资助项目(2015JBC005)

Large-scale Image Storage and Retrieval Based on Hadoop

(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)

Received:2016-10-22 Online:2017-06-23 Published:2017-06-23

摘要/Abstract

摘要： 图像数据的指数型增长使得传统单机的图像检索在处理大规模图像时面临着检索速度慢、并发性差、检索准确率低的问题。由于图像特征文件都是小文件，本文提出将图像特征小文件进行适当的合并后存储于Hadoop的分布式文件系统HDFS中，实现大规模图像的快速存储和读取；为了适应大规模的图像检索，对图像Fisher向量进行二值化处理，并利用MapReduce并行编程模型实现基于二值Fisher向量和SIFT(Scale Invariant Feature Transform)特征的并行检索。在INRIA Holidays数据集、Kentucky数据集和Flicker1M数据集上的实验结果表明该方法扩展性强，能够取得较好的检索准确率，有效减少检索时间，提高检索速度，是一种高效的大规模图像存储和检索的方法。

关键词: 大规模图像, Hadoop, 并行检索, 二值Fisher向量, SIFT

Abstract: The exponential growth of images makes the traditional single machine image retrieval face the problem of slow retrieval speed, poor concurrency and low image accuracy when dealing with large-scale images. According to that the image feature files are small files, this paper proposed to properly merge the small files, and then put store them on thedistributed file system HDFS of Hadoop. It achieved rapidly store and read massive image data. In order to adapt to the large-scale image retrieval, this paper proposed to binarize Fisher vector of images and use MapReduce parallel programming model to realize parallel image retrieval based on binary Fisher vector and SIFT. Experiments on INRIA Holidays dataset, Kentucky dataset and Flicker1M dataset show that the method is scalable, can achieve better retrieval accuracy, effectively reduce the retrieval time and improve the retrieval speed. It is a highly efficient large-scale image storage and retrieval method.

Key words: large-scale images, Hadoop, parallel retrieval, binary Fisher vector, SIFT

中图分类号:

TP391

朱珊，艾丽华. 基于Hadoop的大规模图像存储与检索[J]. 计算机与现代化, 2017, 0(6): 61-66+83.

ZHU Shan, AI Li-hua. Large-scale Image Storage and Retrieval Based on Hadoop[J]. Computer and Modernization, 2017, 0(6): 61-66+83.

参考文献

[1] Ponomarev A, Nalamwar H S, Babakov I, et al. Content-based image retrieval using color, texture and shape features[J]. Key Engineering Materials, 2016,685:872-876.

[2] Ledwich L, Williams S. Reduced SIFT features for image retrieval and indoor localisation[C]// Proceedings of the 2004 Australian Conference on Robotics and Automation. 2004.

[3] Jegou H, Douze M, Schmid C. Packing bag-of-features[C]// Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. 2009.

[4] Perronnin F, Liu Yan, Sánchez J, et al. Large-scale image retrieval with compressed Fisher vectors[C]// Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2010:3384-3391.

[5] Jai-Andaloussi S, Elabdouli A, Chaffai A, et al. Medical content based image retrieval by using the Hadoop framework[C]// Proceedings of the 2013 International Conference on Telecommunications. 2013.

[6] 郭飞,詹炳宏,刘刚. 基于Hadoop的服饰图像存储与检索关键技术研究[J]. 计算机应用研究, 2014,31(4):1086-1089.

[7] Apache. HDFS[EB/OL]. http://wiki.apache.org/hadoop/ProjectDescription, 2009-07-10.

[8] Apache. Map/Reduce[EB/OL]. http://wiki.apache.org/hadoop/Hadoop-MapReduce, 2009-07-10.

[9] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008,51(1):107-113.

[10] Lowe D G. Object recognition from local scale-invariant features[C]// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157.

[11] Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008,110(3):346-359.

[12] 林杰. 面向移动视觉搜索的紧凑聚合描述子研究[D]. 北京:北京交通大学, 2014.

[13] Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization[C]// Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007.

[14] Jaakkola T S, Haussler D. Exploiting generative models in discriminative classifiers[C]// Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems Ⅱ. 1999:487-493.

[15] Jegou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search[C]// Proceedings of the 10th European Conference on Computer Vision: Part I. 2008:304-317.

[16] Nister D, Stewenius H. Scalable recognition with a vocabulary tree[C]// Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2006,2:2161-2168.

基于Hadoop的大规模图像存储与检索

Large-scale Image Storage and Retrieval Based on Hadoop

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李欣, 焦立男, 柳有权, 马彩莎. 一种基于改进SIFT的视频稳像方法[J]. 计算机与现代化, 2024, 0(06): 43-50.
[2]	李家乐1, 李哲润1, 赵勇2, 张杨1. 基于3D-SIFT与4PCS融合的大数据量点云快速配准方法[J]. 计算机与现代化, 2024, 0(02): 1-6.
[3]	张军, 苏文浩 . 基于LZO的Hadoop文件归档优化方法[J]. 计算机与现代化, 2023, 0(06): 1-6.
[4]	吴迎. 基于SIFT和最邻近匹配的商品图像相似度算法[J]. 计算机与现代化, 2020, 0(10): 69-75.
[5]	苏林萍，安然，李为，崔文超，张晓良. 基于Hadoop的电力运维审计系统的设计[J]. 计算机与现代化, 2020, 0(01): 49-.
[6]	陈凯扬1,罗志灶2,王建兴2. 基于Kinect三维重构的特征点提取改进[J]. 计算机与现代化, 2019, 0(11): 34-.
[7]	周天绮. 基于移动通信大数据的城市人口空间分布统计[J]. 计算机与现代化, 2018, 0(05): 45-.
[8]	温贺平1,鲍晶晶2,柯居鑫1,刘树威1. 基于Hadoop的Lorenz超混沌加密算法设计[J]. 计算机与现代化, 2018, 0(03): 108-.
[9]	李程，柴小丽，谢彬，唐鹏. 一种Hadoop YARN的资源调度机制[J]. 计算机与现代化, 2017, 0(11): 29-34.
[10]	张进，冯钧，陆佳民. 基于Hadoop的空间关键字索引方法[J]. 计算机与现代化, 2017, 0(11): 76-83.
[11]	冯钧，徐维纲，冯读庆，陆佳民，徐欢. 面向海量水利数据的索引方法研究[J]. 计算机与现代化, 2017, 0(10): 29-35,41.
[12]	李娜，陈正鸣，吕嘉，刘春芳. HDFS访问中间件的事务设计与实现[J]. 计算机与现代化, 2017, 0(1): 46-50.
[13]	胡晨1，江泽涛2. 一种SIFT像素点筛选预处理降维双向匹配方法[J]. 计算机与现代化, 2016, 0(9): 54-59+72.
[14]	孙立华1，胡牧1，孟庆强1，钱亚康1，王松2. 配网线损大数据高性能计算解决方案[J]. 计算机与现代化, 2016, 0(12): 42-46,50.
[15]	徐欢1,冯钧1,张鹏程1,唐志贤2,刘艺1,陈志飞1,张立霞1. 基于Hadoop的分布式CIF四叉树索引方法[J]. 计算机与现代化, 2016, 0(11): 12-19,24.