정리 | Notion

Untitled

Introduction

VAE 모델과 DDPM 모델이 합쳐서 DiffuseVAE 라는 모델을 내놓는다.

간단하게 VAE와 DDPM 모델의 장단점을 설명해보자

VAE는 downstream에 자주 사용되고 flexible하다는 장점이 있다. 하지만 high-frequency 정보를 통합하는데 실패하고 보통 흐리다는 단점이 있다.
DDPM은 좋은 성과를 내놓지만 latent space controll이 어렵고 많은 연산을 필요하다는 단점이 있다.

위에 장단점을 적절히 잘 섞으면 좋은 결과가 나온다는 내용이다. DiffuseVAE는 아래와 같은 장점을 가진다.

**A novel conditioning framework(**VAE blurry 샘플을 생성하여 conditional DDPM 공식에 사용할 수 있다. )
Controllable synthesis from a low-dimensional latent
Better speed vs quality tradeoff
State of the art comparisons
Generalization to different noises in the conditioning signal

DiffusionVAE: VAEs meet Diffusion Models

1. Training Objective

위 Fig 2에 많은 정보가 담겨 있다. 고화질 이미지 $x_0$과 auxiliary conditioning signal $y$는 VAE 사용되고, latent representation $z$가 있다. 아래와 같이 볼 수 있다.

Untitled

잠재변수 $z$, $θ$ 그리고 $φ$를 확인할 수 있다. $θ, φ$는 VAE decoder와 conditional diffusion model의 reverse process 변수이다. 또한 문제가 있다면 $p(x_{1:T} , z|y, x_0)$은 연산이 intractable하여 $q(x_{1:T} , z|y, x_0)$ 근사하는 방법을 사용한다.

Untitled

$ψ$는 VAE recognition network의 변수이다. DDPM forward process $(x_{1:T} |y, z, x_0)$는 학습 동안 학습되지 않는다. log-likelihood는 아래와 같이 얻을 수 있다.

Untitled