MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

Authors: Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun

What

This paper presents MetaCloak, a novel method for protecting user images from unauthorized personalized image generation using DreamBooth by crafting robust perturbations that can withstand data transformations.

Why

The paper addresses the growing privacy concern of unauthorized use of personal images for AI-generated content, specifically targeting the vulnerabilities of personalized diffusion models like DreamBooth.

How

The authors propose a meta-learning framework to craft transferable and model-agnostic perturbations by training over a pool of surrogate diffusion models. To enhance robustness against data transformations, they incorporate a transformation sampling process during perturbation crafting and utilize a denoising-error maximization loss to introduce semantic distortion.

Result

MetaCloak outperforms existing methods in protecting images under both standard training and training with data transformations, as evidenced by quantitative metrics and qualitative visualizations. It effectively degrades subject detection scores, semantic similarity, and generated image quality. Notably, MetaCloak demonstrates effectiveness in real-world scenarios by successfully fooling online training services like Replicate.

LF

The paper acknowledges limitations in terms of potential vulnerability to advanced adversarial purification techniques and reduced effectiveness under low poisoning ratios. Future work suggestions include investigating mechanisms to further improve stealthiness, particularly under large perturbation radii, and exploring methods for effective protection under low poisoning rates.

Abstract

Text-to-image diffusion models allow seamless generation of personalized images from scant reference photos. Yet, these tools, in the wrong hands, can fabricate misleading or harmful content, endangering individuals. To address this problem, existing poisoning-based approaches perturb user images in an imperceptible way to render them “unlearnable” from malicious uses. We identify two limitations of these defending approaches: i) sub-optimal due to the hand-crafted heuristics for solving the intractable bilevel optimization and ii) lack of robustness against simple data transformations like Gaussian filtering. To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation. Specifically, we employ a pool of surrogate diffusion models to craft transferable and model-agnostic perturbation. Furthermore, by incorporating an additional transformation process, we design a simple denoising-error maximization loss that is sufficient for causing transformation-robust semantic distortion and degradation in a personalized generation. Extensive experiments on the VGGFace2 and CelebA-HQ datasets show that MetaCloak outperforms existing approaches. Notably, MetaCloak can successfully fool online training services like Replicate, in a black-box manner, demonstrating the effectiveness of MetaCloak in real-world scenarios. Our code is available at https://github.com/liuyixin-louis/MetaCloak.