Relation-enhanced DETR for Component Detection in Graphic Design Reverse Engineering

#detection-transformer #object-detection #relation-modeling #graphic-design-reverse-engineering

Mar 18, 2025 06:02 AM

Apr 14, 2025 02:32 AM

Link

https://www.ijcai.org/proceedings/2023/532

Abstract

It is a common practice for designers to create digital prototypes from a mock-up/screenshot. Reverse engineering graphic design by detecting its components (e.g., text, icon, button) helps expedite this process. This paper first conducts statistical analysis to emphasize the importance of relations in graphic layouts, which further motivates us to incorporate relation modeling into component detection. Built on the current state-of-the-art DETR (DEtection TRansformer), we introduce a learnable relation matrix to model class correlations. Specifically, the matrix will be added to the DETR decoder to update the query-to-query self-attention. Experiment results on three public datasets show that our approach achieves better performance than several strong baselines. We further visualize the learned relation matrix and observe some reasonable patterns. Moreover, we show an application of component detection where we leverage the detection outputs as augmented training data for layout generation, which achieves promising results.

Synth

Problem:: 레이아웃 검출에서 각 BBox간의 관계 모델링이 중요함에도 적용되지 않음

Solution:: 관계를 모델링 할 수 있는 요소를 추가

Novelty:: 디자인 레이아웃에서 관계의 중요성을 분석/DETR의 Self-Attention에 Relation 정보를 최초로 적용

Note:: 실험 결과가 미심쩍음. DETR이 예상보다 너무 검출을 못함

Summary

Motivation

그래픽 디자인 역공학에서 구성 요소(텍스트, 아이콘, 버튼 등) 탐지가 중요하지만 복잡한 레이아웃으로 인해 어려움 존재
- Freeform이라는 자유 역할을 부여받은 객체가 복잡하게 연결되어 있음
그래픽 디자인에서 객체 간 관계(Relation)가 중요한데, 기존 연구는 관계 모델링이 부족함
- 사이트의 상단에는 해당 사이트의 타이틀과 이미지가 같이나옴
- Toolbar와 Icon은 보통 함께 나타남
- 실제로 Layout 데이터 셋은 COCO에 비해 연관성 높은 객체가 많음
  - Pointwise Mutual Information (PMI): 0이면 두 클래스는 무관하고 높을수록 높은 상관관계를 가짐
  - COCO에 비해 RICO는 높은 PMI의 Class 쌍이 많음
선행 연구 MagicLayout은 Co-Occurrence 관계만 고려하고 사전에 계산된 고정 가중치를 사용해 학습 유연성이 제한됨

Method

DETR(DEtection TRansformer)을 기반으로 클래스 간 상관관계를 모델링하는 학습 가능한 Relation Matrix
디코더의 Self-Attention 과정에서 이전 레이어의 예측 클래스를 활용해 관계 행렬에서 가중치를 검색하고 Self-Attention Weight에 추가
- 가중치 검색을 위한 Index는 Argmax를 이용하는데, 이는 미분 불가능
  - Forward pass는 Striahgt-through trick으로 Argmax를 그대로 사용
  - Backward pass에서 Gumble-Softmax로 Argmax를 근사한 미분 가능한 경로를 남겨둠
관계 행렬이 학습 과정에서 자동으로 최적화되어 클래스 간 상관관계를 효과적으로 포착

검증 방법

RICO(모바일 UI), Crello(포스터), InfoPPT(슬라이드) 세 가지 그래픽 디자인 데이터셋에서 성능 평가
Faster R-CNN, DETR 기반 모델들과 비교하여 일관된 성능 향상 입증
관계 행렬의 시각화를 통해 모델이 학습한 클래스 간 상관관계 패턴 분석
- 표시된 Input, Text Button 같은 당연히 함께 나오는 요소들의 관계가 높게 나타남
탐지 결과를 레이아웃 생성을 위한 증강 학습 데이터로 활용하는 응용 사례 제시
이전 연구가 제시한 고정된 Co-Occurence Matrix (두 요소가 함께 나타난 수를 이용해 계산)과 BBox 중심 좌표의 거리를 이용한 관계는 모두 Learnable Relation Matrix보다 효과적이지 않음을 보임