计算机与现代化 ›› 2022, Vol. 0 ›› Issue (08): 50-56.

• 人工智能 • 上一篇    下一篇

基于语义融合和多重相似性学习的跨模态检索

  

  1. (华南师范大学计算机学院,广东广州510631)
  • 出版日期:2022-08-22 发布日期:2022-08-22
  • 作者简介:曾奕斌(1996—),男,广东汕头人,硕士研究生,研究方向:跨模态检索,E-mail: ybzeng_scnu@163.com; 葛红(1969—),女,湖北襄阳人,教授,博士,研究方向:智能信息处理,机器学习,深度学习,E-mail: gehong@scnu.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(11973022); 广东省自然科学基金资助项目(2020A1515010710)

Cross-modal Retrieval Based on Context Fusion and Multi-similarity Learning

  1. (School of Computer Science, South China Normal University, Guangzhou 510631, China)
  • Online:2022-08-22 Published:2022-08-22

摘要: 针对现有跨模态检索方法不能充分挖掘模态之间的相似性信息的问题,提出一种基于语义融合和多重相似性学习(CFMSL)方法。首先,在特征提取过程中融合不同模态的语义信息,加强不同模态特征间的交互,使得模型能够充分挖掘模态间的关联信息。然后,利用生成器将单模态特征和融合模态特征映射到公共子空间中,通过最大化锚点与正例样本之间的相似性和最小化锚点与负例样本间的相似性得到具有判别性的特征进行模态对齐。最后,基于决策融合方式对相似性列表进行重排序,使得最终排序结果同时考虑单模态特征和融合模态特征,提高检索性能。通过在Pascal Sentences、Wikipedia、NUS-WIDE-10K这3个广泛使用的图文数据集上进行实验,实验结果表明CFMSL模型能够有效提高跨模态检索任务的性能。

关键词: 跨模态检索, 特征融合, 相似性学习, 重排序, 异构鸿沟

Abstract: Most of cross-modal retrieval methods do not fully exploit the interaction between heterogeneous data. To solve the problem, a novel method called Context Fusion and Multi-Similarity Learning (CFMSL) is proposed. To exploit the interactions between different modal data, the context fusion is adapted to aggregate different modal information. The generative module is used to generate discriminative representations by optimizing the pair similarity loss in the common subspace, which maximizes the intra-class similarity and minimizes the inter-class similarity for cross-modal alignment. Moreover, the re-ranking strategy based on single modality and fused multi-modality is proposed during evaluation phase, appropriately adjusting the final retrieval results to improve the performance. The experiments demonstrate that our proposed method achieves competitive results in cross-modal retrieval tasks on several widely-used image-text datasets, such as Pascal Sentences, Wikipedia, and NUS-WIDE-10K.

Key words: cross-modal retrieval, feature fusion; , similarity learning; , re-ranking; , heterogeneous gap