Untitled

It aims to enhance the resolution of a given image by adding missing high-frequency information.

In fact, for a given low-resolution (LR) image, there exist infinitely many compatible high-resolution (HR) predictions. This poses severe challenges when designing deep learning based super-resolution approaches.

문제 제기 1: LR이미지에 잃은 high-frequency information을 추가하는 능력 향상이 목표임. 하지만 많은 경우의 수가 있어서 어려움이 있음.

While achieving sharper images with better perceptual quality, such methods only predict a single SR output, which does not fully account for the ill-posed nature of the SR problem.

문제 제기 2: 더 나은 perceptual quality와 sharper한 이미지를 위해 loss를 개선하는 방법들이 제안되었지만 아직은 부족함.

We address the limitations of the aforementioned approaches by learning the conditional distribution of plausible HR images given the input LR image. To this end, we design a conditional normalizing flow [11,38] architecture for image super-resolution.

문제 제기 3 및 해결책: LR image를 conditional distribution에 의한 접근은 제한이 있음. 결론적으로는 conditional normalizing flow를 구축하여 해결함.

First, our method naturally learns to generate diverse SR samples without suffering from mode-collapse, which is particularly problematic in the conditional GAN setting [18,30]. Second, while GAN-based SR networks require multiple losses with careful parameter tuning, our network is stably trained with a single loss: the negative log-likelihood. Third, the flow network employs a fully invertible encoder, capable of mapping any input HR image to the latent flow-space and ensuring exact reconstruction.

결론

  1. GAN에 mode-collapse 없는 SR samples를 생성하도록 condition을 주어 학습함.
  2. GAN-based SR networks는 조심히 parameter tuning하는데, 저자 network는 오직 negative log-likelihood loss로 학습함
  3. flow network는 invertible encoder를 사용함. latent flow-space에 HR image mapping이 가능하고 올바르게 reconstruction 해줌.

1. Proposed Method: SRFlow

Conditional Normalizing Flows for Super-Resolution

This constitutes a more challenging task, since the model must span a variety of possible HR images, instead of just predicting a single SR output. Our intention is to train the parameters θ of the distribution in a purely data-driven manner, given a large set of LR-HR training pairs {(xi , yi)}M i=1.

Untitled

한 이미지만 예측하지 않고 여러 가능성이 있는 이미지들을 예측함. data를 가져오는 방법으로 distribution의 parameter θ를 학습한다.

The core idea of normalizing flow [10,38] is to parametrize the distribution py|x using an invertible neural network fθ. In the conditional setting, fθ maps an HR-LR image pair to a latent variable z = fθ(y; x). We require this function to be invertible w.r.t. the first argument y for any LR image x.

distripution p를 neural network fθ를 이용하여 parametrize함. fθ는 HR-LR 이미지 쌍을 latent variable로 map한다 (z=fθ(y;x)). 그래서 first arg y를 LR image x로 invertible 하는 function이 필요함.

That is, the HR image y can always be exactly reconstructed from the latent encoding z as y = f −1 θ (z; x). By postulating a simple distribution pz(z) (e.g. a Gaussian) in the latent space z, the conditional distribution py|x(y|x, θ) is implicitly defined by the mapping y = f −1 θ (z; x) of samples z ∼ pz. The key aspect of normalizing flows is that the probability density py|x can be explicitly computed as,

Untitled

HR image y는 $y=f^{-1}θ(z;x)$로 latent encoding z로부터 reconstructed 가능함. simple distribution $p_z(z)$, conditional distribution p_y|x(y|x,θ)는 $y=f^{-1}θ(z;x)$에 의해 정의됨. 주요한 부분은 공식 1처럼 계산이 가능하다는 거임.

f에 z로 latent variable로 바꾸고 pz에 넣어 간단한 distribution으로 나타낸 것과 야코비안과 곱함.