Untitled

a retrieval-augmented diffusion model (RDM ) is a combination of a conditional latent diffusion model θ [12,22], a database of images Dtrain, which is considered to be an explicit part of the model, and a (non-trainable) sampling strategy ξk to obtain a subset of Dtrain based on a query x as introduced in [3]. The model is trained by implementing ξk as a nearest neighbor algorithm, such that for each query (i.e., training example) its k nearest neighbors are returned as a set, where the distance is measured in CLIP [20] image embedding space. The CLIP embeddings of these nearest neighbors are then fed to the model via the cross attention mechanism [28,22].

RDM은 conditional latent diffusion model과 query x를 기반으로 얻은 데이터에 sampling strategy ξk이다. ξk는 각 query에 NN 알고리즘으로 구현함. CLIP으로 이미지 임베딩 공간과 차이를 얻는다.

Untitled

After training, we replace Dtrain of the original RDM with alternate databases Dstyle, derived from art datasets [24,16] to obtain a post-hoc model modification and thereby zero-shot stylization. Furthermore, we can guide the synthesis process with text-prompts by using the shared text-image feature space of CLIP [20] as proposed in [3]. Thus, we obtain a controllable synthesis model which is only trained on image data

RDM 학습후 데이터셋을 바꾸어 추후에 모델을 정하고 zero-shot style을 얻는다. 저자는 CLIP 임베딩을 통해 guide를 할 수 있다. 그래서 결국은 모델은 학습된 이미지 데이터 한정으로 이미지 통합이 컨트롤 가능하다.

1. Text-Guided Synthesis of Artistic Images with RDMs

General Setting

Dtrain from OpenImage, Dstyle based on the WikiAr

Samples from this model are shown in Fig. 1. By exchanging this database with distinct, style-specific subsets of the ArtBench dataset [16] during inference, we show that RDM can further be used for fine-grained stylization, without being trained for this task.

inference 하는동안 데이터셋을 바꾸어 style-specific subsets는 더욱 나은 style을 보인다.

Zero-Shot Text-Guided Stylization by Exchanging the Databas

Untitled

we show the zeroshot stylization capabilities of the ImageNet-RDM from in Sec 2.1 in Fig. 2.

generalizes to this new database and is capable of generating artwork-like images which depict the content defined by the text prompts

Fig 2를 보면 알 수 있듯 zeroshot stylization을 보인다. prompts가 포함한 내용에 따른 이미지를 art하게 보요준다.

Fine-Grained Stylization with ArtBench

Untitled

By using style specific databases obtained from the ArtBench dataset [16] during inference, we here present an alternative approach. Fig. 3 presents results for the prompt ”Day and night fighting for the domination of time.”

Untitled