정리 | Notion

Untitled

In this work, we demonstrate that state-of-the-art diffusion models do memorize and regenerate individual training examples. To begin, we propose and implement new definitions for “memorization” in image models. We then devise a two-stage data extraction attack that generates images using standard approaches, and flags those that exceed certain membership inference scoring criteria.

Unfortunately, we also find that existing privacy-enhancing techniques do not provide an acceptable privacy-utility tradeoff.

1. Background

Training data privacy attacks.

Membership inference attacks [62, 80, 8] answer the question “was this example in the training set?” and present a mild privacy breach. Neural networks are also vulnerable to more powerful attacks such as inversion attacks [27, 81] that extract representative examples from a target class, attribute inference attacks [28] that reconstruct subsets of attributes of training examples, and extraction attacks [10, 11, 5] that completely recover training examples. In this paper, we focus on each of these three attacks when applied to diffusion models.

2. Motivation and Threat Model

Understanding privacy risks.

Diffusion models that regenerate data scraped from the Internet can pose similar privacy and copyright risks as language models [11, 7, 31]. Similarly, copying images from professional artists has been called “digital forgery” [65] and has spurred debate in the art community.

Understanding generalization.

Beyond data privacy, understanding how and why diffusion models memorize training data may help us understand their generalization capabilities.

It may thus be necessary to broaden our definitions of overfitting to include memorization and related privacy metrics. Our results also suggest that Feldman’s theory that memorization is necessary for generalization in classifiers [24] may extend to generative models, raising the question of whether the improved performance of diffusion models compared to prior approaches is precisely because diffusion models memorize more.

Threat Model

Our threat model considers an adversary A that interacts with a diffusion model Gen (backed by a neural network fθ ) to extract images from the model’s training set D.

Image-generation systems.

Adversary capabilities

A black-box adversary can query Gen to generate images. If Gen is a conditional generator, the adversary can provide arbitrary prompts p. The adversary cannot control the system’s internal randomness r.