AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors

Link
Abstract

Security concerns surrounding text-to-image diffusion models have driven researchers to unlearn inappropriate concepts through fine-tuning. Recent fine-tuning methods typically align the prediction distributions of unsafe prompts with those of predefined text anchors. However, these techniques exhibit a considerable performance trade-off between eliminating undesirable concepts and preserving other concepts. In this paper, we systematically analyze the impact of diverse text anchors on unlearning performance. Guided by this analysis, we propose AdvAnchor, a novel approach that generates adversarial anchors to alleviate the trade-off issue. These adversarial anchors are crafted to closely resemble the embeddings of undesirable concepts to maintain overall model performance, while selectively excluding defining attributes of these concepts for effective erasure. Extensive experiments demonstrate that AdvAnchor outperforms state-of-the-art methods. Our code is publicly available at https://anonymous.4open.science/r/AdvAnchor.

Synth

Problem:: 기존 Diffusion Model Unlearning 기법들의 성능 Trade-off 문제 / 사전 정의된 Text Anchor 기반 방법들의 한계

Solution:: 원치 않는 Concept의 Embedding에 최적화된 Universal Adversarial Perturbation eadv를 추가하여 Adversarial Anchor eanchoradv 생성 / eadv 최적화 (Loss Ladv 최대화)를 통해 해당 Concept의 Defining Attribute 생성 방해 및 제거 성능 향상

Novelty:: Diffusion Model Unlearning에서 Text Anchor의 영향을 체계적으로 분석 및 통찰 도출 / Unlearning 성능 향상을 위한 Adversarial Anchor 개념 최초 제안

Note:: Concept과 가장 유사한 Anchor를 이용한 Unlearning은 보존 능력이 높다는 당연한 관찰을 AdvAnchor의 사용으로 연결 짓는 과정이 재밌음

Summary

Motivation

Impact of Various Anchors on DM Unlearning Analysis

분석 방법: 원치 않는 Concept cu​를 포함하는 Prompt pu​와 Target Prompt panchor​ 간의 예측 차이를 최소화함으로써 DM에서 cu​를 제거함. 이는 다음 손실 함수 Lop​를 최소화하는 방식으로 수행

minθopLop=||fde(xt,epu;θop)fde(xt,eanchor;θori)||2

Method

AdvAnchor

file-20250421002232701.png|725

Method 검증

스타일 언러닝 실험 (Style Unlearning)

객체 언러닝 실험 (Object Unlearning)

노출 콘텐츠 제거 실험 (Explicit Content Removal)

최적화 전략 및 하이퍼파라미터 영향 (Optimization & Hyperparameter)

Limitation