Computer and Modernization ›› 2021, Vol. 0 ›› Issue (11): 1-6.

    Next Articles

Approach for Visual Question Answering Based on Equal Attention Graph Networks

  

  1. (1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; 
    2. Information Department (Informatization Technology Center), Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
  • Online:2021-12-13 Published:2021-12-13

Abstract: Visual question answering is a task that combines computer vision with natural language processing. It needs to understand the scene in the picture, especially the interaction between different target objects. Great progress on visual question answering has been made in recent years, but traditional methods adopt holistic feature representation, which largely ignores the structure of the given image, and cannot effectively locate objects in the scene. Graph networks rely on high-level image representation, which can capture semantic and spatial relationships. However, the former visual question answering methods using graph networks ignored the role of the correspondence between relations and the question in the answering process. According to this, a visual question answering model based on equal attention graph networks named EAGN is proposed. Relationship edges are given the same importance as object nodes through the equal attention mechanism. The combination of these two elements makes the basis for answering the question more sufficient. Experiments show that compared with other related methods, the EAGN model performs well and is more competitive, which also provides a basis for subsequent related research.

Key words: visual question answering, graph networks, computer vision, natural language processing