Computer and Modernization ›› 2025, Vol. 0 ›› Issue (04): 1-5.doi: 10.3969/j.issn.1006-2475.2025.04.001

    Next Articles

Gaze Estimation Model Based on Hybrid Transformer 

  

  1. (School of Computer and Information Science, Chongqing Normal University, Chongqing 401331,China)
  • Online:2025-04-30 Published:2025-04-30

Abstract:  Combined CNN and Transformer, Transformer can gain the advantage of global feature information and improve the awareness of model context information, which can lead to improve model accuracy. A novel gaze estimation model RN-SA(ResNet-MHSA) based on a hybrid Transformer is proposed. In this model, part of the 3×3 spatial convolution layers in ResNet18 are replaced with a block composed of a 1×1 spatial convolution layer and MHSA(Multi-Head Self-Attention) layer, and the DropBlock mechanism is added to the model structure to increase the robustness of the model. Experimental results show that RN-SA model can improve the accuracy of the model while reducing the number of parameters compared with the current better model GazeTR-Hybrid, RN-SA model can improve the accuracy by 4.1% and 3.7% on EyeDiap and Gaze360 datasets, respectively, while the number of parameters is reduced by 15.8%.  Therefore, the combination of CNN and Transformer can be effectively applied to gaze estimation tasks.

Key words:  , gaze estimation, self-attention, MHSA, Transformer

CLC Number: