Improving Adversarial Attacks on Latent Diffusion Model
Authors: Boyang Zheng, Chumeng Liang, Xiaoyu Wu, Yan Liu
What
This paper investigates adversarial attacks on Latent Diffusion Models (LDMs) and proposes a new method, Attacking with Consistent Errors (ACE), to improve their effectiveness in disrupting LDM finetuning for few-shot generation.
Why
This paper is important because it reveals a novel dynamic of adversarial attacks on LDMs, explaining how these attacks disrupt finetuning, and proposes a more effective attack method (ACE) to protect images from unauthorized copying or malicious use in LDM-based few-shot generation.
How
The authors analyze the score-function errors of adversarial examples and identify a “reverse bias” in LDMs finetuned on such examples. They then propose ACE, which manipulates adversarial examples to induce a consistent error pattern, leading to predictable and optimizable sampling biases in the finetuned LDM. Experiments on SDEdit and LoRA pipelines, using CelebA-HQ and WikiArt datasets, demonstrate ACE’s superior performance over existing methods.
Result
The proposed ACE method outperforms existing adversarial attacks on LDMs in disrupting both SDEdit and LoRA, two leading few-shot generation pipelines. ACE achieves this by inducing a consistent, optimizable pattern of errors in the finetuned LDM, leading to significant degradation in the quality of generated images. The paper also provides insights into the dynamics of adversarial attacks on LDMs, particularly the role of “reverse bias” in amplifying the impact of adversarial examples during finetuning.
LF
The authors acknowledge that the optimal target for maximizing the impact of ACE is still an open question and suggest exploring different target options in future work. Additionally, they plan to investigate the generalization of ACE to other LDM-based generative models and explore its robustness against potential defense mechanisms.
Abstract
Adversarial attacks on Latent Diffusion Model (LDM), the state-of-the-art image generative model, have been adopted as effective protection against malicious finetuning of LDM on unauthorized images. We show that these attacks add an extra error to the score function of adversarial examples predicted by LDM. LDM finetuned on these adversarial examples learns to lower the error by a bias, from which the model is attacked and predicts the score function with biases. Based on the dynamics, we propose to improve the adversarial attack on LDM by Attacking with Consistent score-function Errors (ACE). ACE unifies the pattern of the extra error added to the predicted score function. This induces the finetuned LDM to learn the same pattern as a bias in predicting the score function. We then introduce a well-crafted pattern to improve the attack. Our method outperforms state-of-the-art methods in adversarial attacks on LDM.