Ablating Concepts in Text-to-Image Diffusion Models

Link
Abstract

Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. Furthermore, they have been found to replicate the style of various living artists or memorize exact training samples. How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. Our algorithm learns to match the image distribution for a target style, instance, or text prompt we wish to ablate to the distribution corresponding to an anchor concept. This prevents the model from generating target concepts given its text condition. Extensive experiments show that our method can successfully prevent the generation of the ablated concept while preserving closely related concepts in the model.

Synth

Problem:: 대규모 텍스트-이미지 모델이 저작권 있는 작품 생성 문제 / 특정 예술가 스타일 모방 및 개인 사진 재생성 문제 / 모델 재학습은 계산 비용이 매우 높음 / 추론 시간 필터링은 쉽게 우회 가능

Solution:: 목표 개념의 이미지 분포를 기준 개념 분포와 KL divergence 최소화 통해 일치시키는 방법 제안 / Model-based와 Noise-based 두 가지 접근법 개발

Novelty:: 주변 개념 보존하며 목표 개념만 선택적 제거 / Cross-Attention, Embedding, Full Weights 등 다양한 파라미터 업데이트 방식 비교

Note:: 개념당 약 5분 소요 / Cross-Attention 학습시 오타에 강건한 점

Summary

Motivation

Method

file-20250420221703443.png|875

Method 검증

객체 인스턴스 제거

예술 스타일 제거

기억된 이미지 제거

추가 분석 실험

Limitation