Model Inversion Attack via Dynamic Memory Learning

Authors: Gege Qi, YueFeng Chen, Xiaofeng Mao, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue

What

This paper introduces DMMIA, a novel model inversion attack method that leverages dynamic memory mechanisms to recover private training data from trained deep neural networks, addressing the catastrophic forgetting issue in existing GAN-based attacks.

Why

This paper is important because it exposes a significant vulnerability in trained DNN models, demonstrating that sensitive information about training data can be effectively extracted even without direct access to the data itself.

How

The authors propose DMMIA, which uses two types of memory prototypes: Intra-class Multicentric Representation (IMR) for capturing diverse target-related concepts and Inter-class Discriminative Representation (IDR) for distinguishing between classes. These prototypes are progressively updated during training, enabling the attack to retain previously learned features and enhance the diversity and realism of generated samples.

Result

DMMIA achieves state-of-the-art attack performance on multiple benchmark datasets, including CelebA, FaceScrub, and Stanford Dogs, outperforming existing methods in terms of attack success rate, sample realism (FID), and sample diversity metrics. Notably, it demonstrates significant improvements when attacking models trained on datasets with limited image priors, highlighting its effectiveness in scenarios where the attacker has less knowledge about the target data distribution.

LF

The authors acknowledge the dependence of attack success on the diversity of the image prior used in pre-training the StyleGAN2 generator. Future work could explore ways to improve the attack’s effectiveness when prior knowledge about the target data is limited. Additionally, extending DMMIA to black-box settings, where the attacker only has access to the model’s predictions, is mentioned as a potential research direction.

Abstract

Model Inversion (MI) attacks aim to recover the private training data from the target model, which has raised security concerns about the deployment of DNNs in practice. Recent advances in generative adversarial models have rendered them particularly effective in MI attacks, primarily due to their ability to generate high-fidelity and perceptually realistic images that closely resemble the target data. In this work, we propose a novel Dynamic Memory Model Inversion Attack (DMMIA) to leverage historically learned knowledge, which interacts with samples (during the training) to induce diverse generations. DMMIA constructs two types of prototypes to inject the information about historically learned knowledge: Intra-class Multicentric Representation (IMR) representing target-related concepts by multiple learnable prototypes, and Inter-class Discriminative Representation (IDR) characterizing the memorized samples as learned prototypes to capture more privacy-related information. As a result, our DMMIA has a more informative representation, which brings more diverse and discriminative generated results. Experiments on multiple benchmarks show that DMMIA performs better than state-of-the-art MI attack methods.