Benchmarking the Robustness of Image Watermarks

Authors: Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang

What

The paper introduces WAVES, a novel benchmark for evaluating the robustness of image watermarking techniques, specifically focusing on their resistance to various attacks that aim to remove or obscure watermarks.

Why

This paper is important because it addresses the lack of standardized evaluation methods for image watermarking techniques, especially in the context of emerging threats like diffusion purification and adversarial attacks. It proposes a comprehensive benchmark with diverse attacks, standardized metrics, and a focus on real-world scenarios, contributing to the development of more robust watermarking systems.

How

The authors conduct their research by developing a standardized evaluation protocol called WAVES. WAVES evaluates watermarking algorithms on three datasets (DiffusionDB, MS-COCO, and DALL·E3) using a wide range of 26 attacks categorized into distortions, regenerations, and adversarial attacks. It measures watermark detection performance using TPR@0.1%FPR and assesses image quality degradation using a normalized and aggregated metric combining 8 individual image quality metrics.

Result

The evaluation reveals varying vulnerabilities among watermarking methods. Tree-Ring is particularly vulnerable to adversarial attacks, especially grey-box embedding attacks and surrogate detector attacks, which can significantly reduce detection performance while preserving image quality. Stable Signature is susceptible to various regeneration attacks, while StegaStamp demonstrates greater robustness overall. The paper also highlights the risk of using publicly available VAEs in watermarking systems, making them susceptible to attacks.

LF

The authors acknowledge limitations in testing only three watermarking algorithms, albeit carefully chosen representatives. They also point out that the attack ranking methodology depends on selected performance thresholds and image quality metrics, suggesting further exploration with alternative metrics and thresholds as future work. Additionally, the paper encourages the development of watermark-specific defensive strategies and highlights the need for in-processing watermarks to adopt augmentation or adversarial training for enhanced robustness.

Abstract

This paper investigates the weaknesses of image watermarking techniques. We present WAVES (Watermark Analysis Via Enhanced Stress-testing), a novel benchmark for assessing watermark robustness, overcoming the limitations of current evaluation methods.WAVES integrates detection and identification tasks, and establishes a standardized evaluation protocol comprised of a diverse range of stress tests. The attacks in WAVES range from traditional image distortions to advanced and novel variations of diffusive, and adversarial attacks. Our evaluation examines two pivotal dimensions: the degree of image quality degradation and the efficacy of watermark detection after attacks. We develop a series of Performance vs. Quality 2D plots, varying over several prominent image similarity metrics, which are then aggregated in a heuristically novel manner to paint an overall picture of watermark robustness and attack potency. Our comprehensive evaluation reveals previously undetected vulnerabilities of several modern watermarking algorithms. We envision WAVES as a toolkit for the future development of robust watermarking systems. The project is available at https://wavesbench.github.io/