Reconstructing visual images from brain activity, such as that measured by functional Magnetic Resonance Imaging (fMRI), is an intriguing but challenging problem,
문제 제기 1: 사람 뇌 활성으로부터 시각화하는 게 어려움(fMRI로 측정됨).
we still need to discover how they represent latent signals within each layer of DMs, how the latent representation changes throughout the denoising process, and how adding noise affects conditional image generation.
문제 제기 2: 지금까지 diffusion model들이 어떻게 latent signal을 나타내는지 등 아직 설명이 안되는 이해가 안되는 부분이 많음.
Here, we attempt to tackle the above challenges by reconstructing visual images from fMRI signals using an LDM named Stable Diffusion.
저자는 위와 같은 문제를 fMRI신호랑 LDM을 결합해서 해결한다.
For the training dataset, we used the three separate trials without averaging.
NSD라는 fMRI 관련 데이터셋을 사용하는데 3가지로 이루어짐. 이 3개 각각 평균하지 않고 개별적으로 사용함.
We define z as the latent representation of the original image compressed by the autoencoder, c as the latent representation of texts (average of five text annotations associated to each MS COCO image), and zc as the generated latent representation of z modified by the model with c. We used these representations in the decoding/encoding models described below.
다 아는 얘기라 별다른 설명은 x
We performed visual reconstruction from fMRI signals using LDM in three simple steps as follows (Figure 2, middle). The only training required in our method is to construct linear models that map fMRI signals to each LDM component, and no training or fine-tuning of deep-learning models is needed.
간단한 3단계로 이루어져 LDM을 사용해 fMRI 신호를 시각화한다. 학습은 오직 LDM에 fMRI 신호를 map하는 선형 모델만 학습하면된다. 모델에 fine-tuning, training은 필요 없다.
(i) First, we predicted a latent representation z of the presented image X from fMRI signals within early visual cortex. z was then processed by an decoder of autoencoder to produce a coarse decoded image Xz with a size of 320 ⇥ 320, and then resized it to 512 ⇥ 512.
(ii) Xz was then processed by encoder of autoencoder, then added noise through the diffusion process.