Adversarial Purification with Score-based Generative Models

Link
Abstract

While adversarial training is considered as a standard defense method against adversarial attacks for image classifiers, adversarial purification, which purifies attacked images into clean images with a standalone purification, model has shown promises as an alternative defense method. Recently, an EBM trained with MCMC has been highlighted as a purification model, where an attacked image is purified by running a long Markov-chain using the gradients of the EBM. Yet, the practicality of the adversarial purification using an EBM remains questionable because the number of MCMC steps required for such purification is too large. In this paper, we propose a novel adversarial purification method based on an EBM trained with DSM. We show that an EBM trained with DSM can quickly purify attacked images within a few steps. We further introduce a simple yet effective randomized purification scheme that injects random noises into images before purification. This process screens the adversarial perturbations imposed on images by the random noises and brings the images to the regime where the EBM can denoise well. We show that our purification method is robust against various attacks and demonstrate its state-of-the-art performances.

Synth

Problem:: 기존 적대적 정제 방법(EBM + MCMC)은 정제 과정에 1,000단계 이상의 긴 Markov 체인이 필요하여 실용성이 떨어짐/EBM이 제대로 학습되지 않으면 그레디언트가 부정확해 정제 성능이 하락함

Solution:: Stochastic Markov 체인 대신에 Score Function을 이용한 Deterministic Update로 정제 속도 향상/Adaptive Step Size로 초기 속도 및 정확도 향상/Denoising Score Matching으로 그레디언트를 직접 학습/정제 전에 적대적 공격보다 큰 랜덤 노이즈를 넣어서 적대적 공격을 Screening Out

Novelty:: 효율적인 결정론적 정제 과정과 랜덤 노이즈 주입을 통한 방어 성능 향상

Note:: Score의 Norm을 이용해서 적대적 공격 여부 감지 방식 제안

Summary

Motivation

Method

file-20250401005956783.png|600

Method 검증

Natural, Adversarial 및 각 이미지의 Purified 된 이미지들의 sθ(x)2.