Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score

Link
Abstract

Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions. Unfortunately, estimating or comparing two data distributions is extremely difficult, especially in high-dimension spaces. Recently, the gradient of log probability density (a.k.a., score) w.r.t. the sample is used as an alternative statistic to compute. However, we find that the score is sensitive in identifying adversarial samples due to insufficient information with one sample only. In this paper, we propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. Specifically, to obtain adequate information regarding one sample, we perturb it by adding various noises to capture its multi-view observations. We theoretically prove that EPS is a proper statistic to compute the discrepancy between two samples under mild conditions. In practice, we can use a pre-trained diffusion model to estimate EPS for each sample. Last, we propose an EPS-based adversarial detection (EPSAD) method, in which we develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples. We also prove that the EPS-based MMD between natural and adversarial samples is larger than that among natural samples. Extensive experiments show the superior adversarial detection performance of our EPS-AD.

Synth

Problem:: 단일 샘플의 Score만으로는 적대적 데이터 탐지에 충분한 정보를 제공하지 못함/기존 방법들은 타임스텝에 민감하고 벡터의 방향 정보를 무시함

Solution:: 여러 타임스텝에서 다양한 교란을 통합한 Expected Perturbation Score(EPS) 제안/Maximum Mean Discrepancy(MMD)를 사용해 EPS 분포 차이 측정 및 방향 정보 사용

Novelty:: 단일 샘플의 여러 타임 스텝을 통한 새로운 통계량 EPS 제안/자연-적대적 샘플 간 EPS 차이를 이론적으로 분석

Note:: 우선 Score 방식을 제안한 방법을 봐야 제대로 이해가 될 듯

Summary

Motivation

Score 함수와 그 한계점

Method

Expected Perturbation Score (EPS)

EPS 기반 적대적 탐지 (EPS-AD)

file-20250331233345698.png

{x0(i)}: Natural Image, x~0: Test Image

Method 검증