정리 | Notion

some concepts learned by the model are undesirable, including copyrighted content and pornography, which we aim to avoid in the model’s output [24, 15, 26]. In this paper, we propose an approach for selectively removing a single concept from a text-conditional model’s weights after pretraining.

문제 제기 : 인터넷에서 가져온 많은 데이터들 중 우리가 원하지 않는 데이터가 포함되어 모델이 학습 할 수 있음. 예시로 저작권, 성적인 이미지들이 있는데 이 논문에서는 이런 개념을 없애는 방법을 제시함.

Method

Untitled

Our approach involves editing the pre-trained diffusion U-Net model weights θ to remove a specific style or concept.

Specifically, we apply principles of classifier-free guidance to train the diffusion model, steering the model’s score away from a specific concept c that we aim to erase, such as the phrase “Van Gogh.

모델에 특정 컨셉을 지움. 저자는 classifier-free guidance로 모델을 학습학 concept c를 내보내도록 조종함.

From the score-based formulation of diffusion model, the objective is to learn the score of conditional model ∇ log pθ(xt|c) [16]. Using Bayes rule and ∇ log pθ ∗ (c) = 0 we arrive at:

Untitled

This can be interpreted as unconditional score with gradient from a classifier pθ(c|xt). To control the effect of conditionality, a guidance factor η is introduced for the classifier gradient [42]

Untitled

We wish to negate concept c by inverting the behavior of θ ∗ , and therefore we use the negative version of guidance to train θ. Additionally, taking inspiration from classifier-free guidance [17], we transform the RHS of Equation 5 from classifier to conditional diffusion.

Untitled

Based on Tweedie’s formula [12] and the reparametrization trick proposed in [16], the gradient of log probability score can be expressed as a function of score scaled by timevarying parameters. This modified score function moves the data distribution to maximize the log probability score.

Untitled

score based 모델 pθ는 베이지 를과 Pθ로 표현할 수 있다.(공식4) 그리고 condition의 효과를 조절하기 위해 guidance factor를 두고 저자는 θ을 inverting하여 negate concept c를 무효화 하고자 하여 θ를 훈련시키기 위해 guidance의 negative version을 사용한다. 추가적으로 공식 5의 RHS를 clssifier에서 conditional diffusion으로 바꾼다. Tweedie’s formula 기반하고 reparametrization trick을 제안하는데 log probability socre의 기울기는 timevarying parameters에 의해 score scaled의 함수로 표현될 수 있다. 이 수정된 socre function은 log probability socre 최대화를 위해 데이터 분포를 움직인다.

The objective function in Equation 7 fine-tunes the parameters θ such that θ(xt, c, t) mimics the negatively guided noise. That way, after the fine tuning, the edited model’s conditional prediction is guided away from the erased concept.

공식 7에 objective function은 negatively guided noise를 따라한 Eθ(xt,c,t) 같은 파라미터 θ를 finetunes한다. 이렇게 하여 edited model의 conditional prediction은 지우고자하는 컨셉으로부터 멀리 떨어지게 가이드한다.

Training uses several instances of the diffusion model, with one set of parameters frozen (θ ∗ ) while training the other set of parameters (θ) to erase the concept. We sample partially denoised images xt conditioned on c using θ, then we perform inference on the frozen model θ ∗ twice to predict the noise, once conditioned on c and the other unconditioned.

파라미터 θ가 concept을 지우기 위한 학습 중에는 파라미터(θ*)를 frozen한다. 저자는 θ 사용해 c를 xt conditioned로 사용하여 sample을 얻고 frozen model에서도 노이즈를 예측해 하나는 conditioned, 다른 하나는 uncoditioned 결과를 가져와 이 2 결과를 합쳐 개념과 연관해 negated predicted 노이즈를 하고 저자는 이 방향으로 모델을 조절한다. Fig2를 보면 매우 잘 이해된다.