However, the per-pixel losses used by these methods do not capture perceptual differences between output and ground-truth images

문제제기: per-pixel losses 결과와 ground truth에 perceptual한 차이를 잡지 못함.

These approaches produce high-quality images, but are slow since inference requires solving an optimization problem.

perceptual loss를 이용하면 해결할 수 있으나 느리고 optimization problem을 풀어야함.

In this paper we combine the benefits of these two approaches. We train feedforward transformation networks for image transformation tasks, but rather than using per-pixel loss functions depending only on low-level pixel information, we train our networks using perceptual loss functions that depend on high-level features from a pretrained loss network.

이 논문에서 2가지 접근법을 융합함. low leven pixel information에는 per-pixel loss를 사용함. high-level에서는 perceptual loss를 사용함.

1. Method

The image transformation network is trained using stochastic gradient descent to minimize a weighted combination of loss functions:

Untitled

fw는 결과물, yi는 input image임. 이값을 최소화 시키는 방향으로 학습됨.

The key insight of these methods is that convolutional neural networks pretrained for image classification have already learned to encode the perceptual and semantic information we would like to measure in our loss functions. We therefore make use of a network φ which as been pretrained for image classification as a fixed loss network in order to define our loss functions.

The loss network φ is used to define a feature reconstruction loss φ f eat and a style reconstruction loss φ style that measure differences in content and style between images.

image classification을 위한 network를 pretrained하여 loss function으로 사용됨. φ는 고정된 loss network로 사용됨. style, feature reconstruction에 쓰임.

Image Transformation Networks

UP.Down에 집중하여 리뷰하미

Downsampling and Upsampling. For super-resolution with an upsampling factor of f, we use several residual blocks followed by log2 f convolutional layers with stride 1/2.

Pooling이 아닌 Conv에 stride를 이용함.

Rather than relying on a fixed upsampling function, fractionally-strided convolution allows the upsampling function to be learned jointly with the rest of the network.

고정된 upsampling function(아마 pyrmid 같은 거 의미하는듯)보다는 fractinally-strided conv가 전체적인 모델과 같이 학습되어 좋음

Perceptual Loss Functions