计算机与现代化 ›› 2023, Vol. 0 ›› Issue (08): 44-53.doi: 10.3969/j.issn.1006-2475.2023.08.008

• 人工智能 • 上一篇    下一篇

基于注意力机制和语义相似度的跨模态哈希检索

  

  1. (华南师范大学计算机学院,广东 广州 510631)
  • 出版日期:2023-08-30 发布日期:2023-09-13
  • 作者简介:王鸿(1996—),男,江西吉安人,硕士研究生,研究方向:跨模态检索,E-mail: 1090463612@qq.com; 葛红(1969—),女,湖北襄阳人,副教授,博士,研究方向:智能信息处理,机器学习,深度学习,E-mail: gehong@scnu.edu.cn。 基于注意力机制和语义相似度的跨模态哈希检索
  • 基金资助:
    基金项目:国家自然科学基金资助项目(62177015)

Cross Modal Hash Retrieval Based on Attention Mechanism and Semantic Similarity

  1. (School of Computer Science, South China Normal University, Guangzhou 510631, China)
  • Online:2023-08-30 Published:2023-09-13

摘要: 摘要:现如今,跨模态哈希检索已被广泛且成功地应用于多媒体相似性搜索应用中。为进一步提高检索性能,针对现有深度哈希检索方法存在的2个主要问题:1)如何度量不同模态的相似度,更精确地表示模态间的相似性;2)如何融合多个模态的特征,得到更丰富的特征表示,避免把多个模态单独处理,未考虑之间的联系造成的信息丢失。因此提出基于注意力机制和语义相似度的跨模态哈希检索方法(ASSH),该模型定义了新的多标签相似度衡量方法,对不同标签的重要程度加以区分,更好地表达不同模态的相似信息。设计注意力机制融合模块,使得其在特征学习过程中融合不同模态的特征,加强不同模态之间的交互,来捕捉不同模态的局部重要信息。本文在MIR-Flickr25k、IAPR TC-12、NUS-WIDE等广泛使用的图文数据集上进行实验,实验结果表明本文方法在各个问题模式下均超过之前的方法,在哈希码长度为16 bit时,与当前最好的检索方法相比平均检索精度(mAP)分别提升了1.1% 、0.63%。同时,消融实验也充分证明了本文方法的有效性。

关键词: 关键词:跨模态检索, 注意力机制, 语义相似度, 哈希检索, 特征融合

Abstract: Abstract: Nowadays, cross-modal hash retrieval has been widely and successfully used in multimedia similarity search applications. There are two challenged questions in deep hash retrieval methods:1)How to measure multiple modal’s similarity more accurately. 2)How to fuse multiple modal’s features to gain more abundant feature representations, so as to avoid key information loss. Therefore, in order to solve these two problems, we propose a novel cross-modal hashing method, called cross-modal hash retrieval model based on attention mechanism and semantic similarity (ASSH), by defining a new multi-label similarity measurement method to distinguish the importance of different labels, designing an attention fusion module to fuse the features and enhance the interaction between different modal. Experimental results demonstrate that the proposed method outperforms the previous methods in all problem modes on the three common datasets MIRFLICKR-25K, NUS-WIDE and IAPR TC-12. Compared to the state-of-the-art method, when the hash code length is 16 bit, the mean Average Precision (mAP) is improved by 1.1% and 0.63%. At the same time, the ablation experiment also fully proved the effectiveness of the method.

Key words: Key words: cross modal retrieval, attention mechanism, similarity matrix, hash retrieval, feature fusion

中图分类号: