With sufficiently small time steps, the forward process is a diffusion (Song et al., 2020b) and the spatial-temporal evolution of the data density is thus governed by the classic FokkerPlanck partial differential equation (PDE) (Øksendal, 2003). In principle, this implies that with knowledge of the density for a single noise level, we could recover all the densities by solving the Fokker-Planck equation (FPE) without any additional learning.

small time steps에 만족하는 forward process는 diffusion과 data density의 시공간적 변화는 FokkerPlanck partian differential equation(PDE)에 의해 작동한다. single noise level에 밀도의 knowledge가 있다는 걸 암시하고 추가적인 학습 없이 FPE로 모들 densities를 되찾는다.

Our contributions

Building on the above notions, we derive an associated system of PDEs that characterizes the evolution of the scores (i.e., gradients) of the perturbed data densities; we term it as score Fokker-Planck equation (score FPE). In theory, the ground truth scores of the perturbed data densities must satisfy the score FPE. Hence, we mathematically study the implications of satisfying the score FPE.

저자는 PDE의 시스템에 따라 score의 변화를 특성화 한다. score FPE라 한다. 이론적으로 perturbed data desities의 ground truth score는 반두시 score FPE를 만족해야한다. 따라서 저자는 score FPE 만족하는 implication을 수학적으로 연구한다.

1. Background

The process is driven by the following forward SDE1

Untitled

where f(·, t): R D → R D, g(·): R → R and wt is a standard Wiener process. Under moderate conditions (Anderson, 1982), a reverse time SDE from T to 0 can be obtained as follows:

Untitled

We can train a time-conditional neural network sθ = sθ(x, t) to approximate ∇x log qt(x) by minimizing a score matching objective (Hyvarinen &

Untitled

As qt(x) is generally inaccessible, the denoising score matching (DSM) loss (Vincent, 2011; Song et al., 2020b) JDSM(θ; λ(·)) is exploited in practice instead:

Untitled

After sθ(x, t) ≈ ∇x log qt(x) is learned, we replace ∇x log qt(x) in Eq. (2) with sθ and obtain a parametrized reverse-time SDE for a stochastic process xˆθ(t)

Untitled

Let p SDE t,θ denote the marginal distribution of xˆθ(t) with an initial distribution defined as the prior π, where we suppress the dependence on π for compactness. We can design f and g in Eq. (2) so that qT (x) approximates a simple prior π; then, we can generate samples xˆθ(0) ∼ p SDE 0,θ by numerically solving Eq. (4) backward with an initial sample from the prior xˆθ(T) ∼ π.

introduced a deterministic process (with a zero diffusion term) describing the evolution of samples whose trajectories share the same marginal probability densities as the forward SDE (Eq. (4)). Specifically, the process evolves through time according to the following probability flow ODE:

Untitled

As in the SDE case, the ground truth score in Eq. (5) is approximated with the learned score model sθ(x, t) ≈ ∇x log qt(x). This yields to the following parameterized probability flow ODE: