Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

Link
Abstract

DETR-like methods have significantly increased detection performance in an end-to-end manner. The mainstream two-stage frameworks of them perform dense selfattention and select a fraction of queries for sparse crossattention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates that suboptimal two-stage selection strategies result in scale bias and redundancy due to the mismatch between selected queries and objects in two-stage initialization. To address these issues, we propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries, for a better trade-off between computational efficiency and precision. The filtering process overcomes scale bias through a novel scale-independent salience supervision. To compensate for the semantic misalignment among queries, we introduce elaborate query refinement modules for stable two-stage initialization. Based on above improvements, the proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets, as well as 49.2% AP on COCO 2017 with less FLOPs. The code is available at https://github.com/xiuqhou/Salience-DETR.

Synth

Problem:: 큰 객체 위주의 쿼리 선택으로 작은 객체 탐지에 어려움(Scale Bias)/또한 객체 수보다 훨씬 많은 쿼리를 처리하여 비효율적임(Redundancy)

Solution:: Salience Supervision으로 Scale Bias 해소/Hierarchical Filtering으로 쿼리개수 줄임&Refinement로 성능은 보존

Novelty:: 기존의 Binary Supervision이 아니라 스케일에 독립적인 Salience Supervision을 도입

Note:: Focus DETR을 안 읽고 읽어서 그럴 수도 있는데, 진짜 논문 개같이 쓴듯. Github에도 내용 묻는 사람이 있음/어차피 Decoder Query 샘플링 할건데, Encoder에서 굳이 다 연산해야되나 + 연산 안하면 성능 박살나니까 추가과정 있어야되네

Summary

Problem

빨간색 상자: GT Box, 하늘색 점: Query → 큰 상자에 Query가 쓸데 없이 많음 & 작은 상자에는 없음. Symmetrical Query는 나타는 냈는데 논문에 추가 언급 없음

Method

file-20250314234954711.png

Salience-Guided Supervision

file-20250314235344302.png|700

Hierarchical Filtering

Query Refinement

Background Embedding

qunselected(i,j)qunselected(i,j)+Absolute Background Embedding

Cross-Level Token Fusion

file-20250314235444514.png|525

Redundancy Removal for Two-Stage Queries

Method 검증