Patch-Level Gaze Distribution Prediction for Gaze Following

Link
Abstract

Gaze following aims to predict where a person is looking in a scene, by predicting the target location, or indicating that the target is located outside the image. Recent works detect the gaze target by training a heatmap regression task with a pixel-wise mean-square error (MSE) loss, while formulating the in/out prediction task as a binary classification task. This training formulation puts a strict, pixel-level constraint in higher resolution on the single annotation available in training, and does not consider annotation variance and the correlation between the two subtasks. To address these issues, we introduce the patch distribution prediction (PDP) method. We replace the in/out prediction branch in previous models with the PDP branch, by predicting a patch-level gaze distribution that also considers the outside cases. Experiments show that our model regularizes the MSE loss by predicting better heatmap distributions on images with larger annotation variances, meanwhile bridging the gap between the target prediction and in/out prediction subtasks, showing a significant improvement in performance on both subtasks on public gaze following datasets.

Synth

Problem:: 기존 Gaze Following 모델들은 단일 가우시안 분포를 강제하여 주석 불일치 문제 발생/Heatmap 예측과 In-Out 예측을 분리하여 처리함으로써 상관관계 무시

Solution:: Patch-Level 시선 분포 예측으로 강제성 완화 및 상관관계 고려

Novelty:: 라벨의 시선 목표 불확실성에 대한 문제를 분석함/시선 목표 예측과 In-Out 예측을 통합하여 수행한 첫 번째 접근 법

Note:: GazeFollow Dataset은 In-Out 평가가 의미없는데 (Test Set은 다 In이라) 학습시 이를 위한 Loss를 사용하면 Test 성능이 떨어짐을 발견

Summary

Motivation

Method

file-20250318231609761.png

Attention Module

file-20250318231700449.png|500

Gaze Distribution 예측

Method 검증

다른 연구와 비교

Ablation Study