makes use of two Markov chains: a forward chain that perturbs data to noise, and a reverse chain that converts noise back to data.
Markov chain reverses the former by learning transition kernels parameterized by deep neural networks.

random variables x1,x2….
transition kernel q(xt|xt-1)
One typical design for the transition kernel is Gaussian perturbation, and the most common choice for the transition kernel is

→ 가우시안 커널이 하나의 방법이지, 정답은 아님
sample xt는 x0가 주어지면 아래와 같이 계산이 가능함

Intuitively speaking, this forward process slowly injects noise to data until all structures are lost. For generating new data samples, DDPMs start by first generating an unstructured noise vector from the prior distribution (which is typically trivial to obtain), then gradually remove noise therein by running a learnable Markov chain in the reverse time direction.
. Specifically, the reverse Markov chain is parameterized by a prior distribution 𝑝(x𝑇 ) = N (x𝑇 ; 0, I) and a learnable transition kernel 𝑝𝜃 (x𝑡−1 | x𝑡). We choose the prior distribution 𝑝(x𝑇 ) = N (x𝑇 ; 0, I) because the forward process is constructed such that 𝑞(x𝑇 ) ≈ N (x𝑇 ; 0, I). The learnable transition kernel 𝑝𝜃 (x𝑡−1 | x𝑡) takes the form of

→ 위에 reverse process를 forward process에 근사하는 단계라 볼 수 있음.
두 프로세스느 Kullback Leibler divergence로 최소화되는 성과를 볼 수 있음.

iii는 Jensen’s inequality임. 공식 8은 VLB. VLB를 최대화하도록 학습됨.
VLB는 쉽게 최적화가 되는데 이유는 independent terms의 합이기에 Monte Carlo 샘플링이 가능하고 stochastic optimization에 효과적임.