D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Link
Abstract

We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively refining probability distributions, providing a fine-grained intermediate representation that significantly enhances localization accuracy. GO-LSD is a bidirectional optimization strategy that transfers localization knowledge from refined distributions to shallower layers through self-distillation, while also simplifying the residual prediction tasks for deeper layers. Additionally, D-FINE incorporates lightweight optimizations in computationally intensive modules and operations, achieving a better balance between speed and accuracy. Specifically, D-FINE-L / X achieves 54.0% / 55.8% AP on the COCO dataset at 124 / 78 FPS on an NVIDIA T4 GPU. When pretrained on Objects365, D-FINE-L / X attains 57.1% / 59.3% AP, surpassing all existing real-time detectors. Furthermore, our method significantly enhances the performance of a wide range of DETR models by up to 5.3% AP with negligible extra parameters and training costs. Our code and pretrained models: https://github.com/Peterande/D-FINE.

Synth

Problem:: 고정된 좌표값 예측은 불확실성 모델링이 어려움/기존 분포 기반 방식들은 앵커가 필요함/기존 분포 기반 방식들은 최대 거리가 고정이라 작은 물체에 취약/분포 방식의 Bin 크기가 고정/Localization Distillation은 효과적이나 앵커가 필요함

Solution: 초기 Fix Coord Head로 최대 거리 예측 + D-Fine Head로 분포 예측/각 Bin의 크기를 변화가 적으면 작게, 크면 크게 수정/앵커 없이 각 Decoder Layer에 Self-Distillation 적용

Novelty:: Anchor-Free 방식인 DETR 계열에 Anchor를 요구하던 분포 기반 예측 및 Localization Distillation 적용

Note:: ICLR 2025 Spotlight으로 기존 연구들의 아이디를 잘 가져와서 변형한 연구

Summary

Motivation

Method

Fine-grained Distribution Refinement (FDR)

file-20250322174926189.png

Fine-Grained Localization (FGL)

LFGL=l=1L(k=1KIoUk(ωCE(Prl(n)k,n)+ωCE(Prl(n)k,n)))

Global Optimal Localization Self-Distillation (GO-LSD)

file-20250323220651134.png

LDDF=T2l=1L1(k=1KmαkKL(Prl(n)k,PrL(n)k)+k=1KuβkKL(Prl(n)k,PrL(n)k))

Bag of Freebies

Method 검증

주요 성능 결과

다양한 DETR 모델에 대한 효과

추가 성능 결과