计算机与现代化

• 图像处理 • 上一篇    下一篇

基于Hadoop的大规模图像存储与检索

  

  1. (北京交通大学计算机与信息技术学院,北京 100044)
  • 收稿日期:2016-10-22 出版日期:2017-06-23 发布日期:2017-06-23
  • 作者简介:朱珊(1992-),女,安徽铜陵人,北京交通大学计算机与信息技术学院硕士研究生,研究方向:分布式与并行化; 艾丽华(1964-),女,副教授,硕士生导师,研究方向:分布式与并行化,云计算。
  • 基金资助:
    北京市自然科学基金资助项目(4152042); 中央高校基本科研业务费专项资金资助项目(2015JBC005)

Large-scale Image Storage and Retrieval Based on Hadoop

  1. (School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
  • Received:2016-10-22 Online:2017-06-23 Published:2017-06-23

摘要: 图像数据的指数型增长使得传统单机的图像检索在处理大规模图像时面临着检索速度慢、并发性差、检索准确率低的问题。由于图像特征文件都是小文件,本文提出将图像特征小文件进行适当的合并后存储于Hadoop的分布式文件系统HDFS中,实现大规模图像的快速存储和读取;为了适应大规模的图像检索,对图像Fisher向量进行二值化处理,并利用MapReduce并行编程模型实现基于二值Fisher向量和SIFT(Scale Invariant Feature Transform)特征的并行检索。在INRIA Holidays数据集、Kentucky数据集和Flicker1M数据集上的实验结果表明该方法扩展性强,能够取得较好的检索准确率,有效减少检索时间,提高检索速度,是一种高效的大规模图像存储和检索的方法。

关键词: 大规模图像, Hadoop, 并行检索, 二值Fisher向量, SIFT

Abstract: The exponential growth of images makes the traditional single machine image retrieval face the problem of slow retrieval speed, poor concurrency and low image accuracy when dealing with large-scale images. According to that the image feature files are small files, this paper proposed to properly merge the small files, and then put store them on thedistributed file system HDFS of Hadoop. It achieved rapidly store and read massive image data. In order to adapt to the large-scale image retrieval, this paper proposed to binarize Fisher vector of images and use MapReduce parallel programming model to realize parallel image retrieval based on binary Fisher vector and SIFT. Experiments on INRIA Holidays dataset, Kentucky dataset and Flicker1M dataset show that the method is scalable, can achieve better retrieval accuracy, effectively reduce the retrieval time and improve the retrieval speed. It is a highly efficient large-scale image storage and retrieval method.

Key words: large-scale images, Hadoop, parallel retrieval, binary Fisher vector, SIFT

中图分类号: