Computer and Modernization ›› 2022, Vol. 0 ›› Issue (08): 50-56.

Previous Articles     Next Articles

Cross-modal Retrieval Based on Context Fusion and Multi-similarity Learning

  

  1. (School of Computer Science, South China Normal University, Guangzhou 510631, China)
  • Online:2022-08-22 Published:2022-08-22

Abstract: Most of cross-modal retrieval methods do not fully exploit the interaction between heterogeneous data. To solve the problem, a novel method called Context Fusion and Multi-Similarity Learning (CFMSL) is proposed. To exploit the interactions between different modal data, the context fusion is adapted to aggregate different modal information. The generative module is used to generate discriminative representations by optimizing the pair similarity loss in the common subspace, which maximizes the intra-class similarity and minimizes the inter-class similarity for cross-modal alignment. Moreover, the re-ranking strategy based on single modality and fused multi-modality is proposed during evaluation phase, appropriately adjusting the final retrieval results to improve the performance. The experiments demonstrate that our proposed method achieves competitive results in cross-modal retrieval tasks on several widely-used image-text datasets, such as Pascal Sentences, Wikipedia, and NUS-WIDE-10K.

Key words: cross-modal retrieval, feature fusion; , similarity learning; , re-ranking; , heterogeneous gap