정리 | Notion

Untitled

There have been several attempts [9, 36, 25] to invert a pre-trained text-toimage model, obtaining a text embedding representation to capture the object in the reference images

Capturing object relation is intrinsically a harder task as it requires the understanding of interactions between objects as well as the composition of an image, and existing inversion methods are unable to handle the task due to entity leakage from the reference images

text2image model를 반전하여 reference image에 object를 capture 위해 text embedding을 얻으려는 시도 많았음.

Capturing object relation은 어렵고 이미지 구성요소 뿐만 아니라 objects 간에 상호작용의 이해가 필요함. 그리고 inversion 방법들은 reference image들로부터 전체적인 leakage 떄문에 task handle이 불가능함.

In this paper, we study the Relation Inversion task, whose objective is to learn a relation that co-exists in the given exemplar images. Specifically, with objects in each exemplar image following a specific relation, we aim to obtain a relation prompt in the text embedding space of the pre-trained text-to-image diffusion model.

이 논문에서 Relation Inversion task를 제안함. 주어진 exemplar image에 같이 있는 relation을 학습함. 특정 관계를 따르는 exempler image에 objects와 함께 diffusion model의 text embedding에 관계있는 prompt를 얻는 데 목표를 둠

To better represent high-level relation concepts with the learnable prompt, we introduce a simple yet effective preposition prior. The preposition prior is based on a premise and two observations in the text embedding space.

학습 가능한 prompt에 high-level relation concepts를 더 잘 표현하기 위해, 간단하지만 틀림없이 효과적인 preposition prior을 도입함. 이 preposition proir는 text embedding space에 2가지 observation과 premise 기반으로 구성됨.

Based on our preposition prior, we propose the ReVersion framework to tackle the Relation Inversion problem. Notably, we design a novel relation-steering contrastive learning scheme to steer the relation prompt towards a relation-dense region in the text embedding space.

preposition prior 기반으로 Relation Inversion problem tackle을 위해 ReVersion framework를 제안한다. 특히, 텍스트 임베딩 공간에서 relation-dense region 프롬프트를 조정하기 위해 relation-steering contrasive learning 체계를 설계함.

1. The Relation Inversion Task

Relation Inversion aims to extract the common relation hRi from several exemplar images. Let I = {I1, I2, ...In} be a set of exemplar images, and Ei,A and Ei,B be two dominant entities in image Ii . In Relation Inversion, we assume that the entities in each exemplar image interacts with each other through a common relation R. A set of coarse descriptions C = {c1, c2, ...cn} is associated to the exemplar images, where “ci = Ei,A hRi Ei,B” denotes the caption corresponding to image Ii

Relation Inversion은 각 exenpler image들로부터 common relation <R> 추출을 목표로 둠. $I_n$ 이미지 2개를 골라 그 2개 이미지를 공통된 관계 R을 통해 interacts하여 추정하게함. coarse descriptions C는 exemplar image와 관련되었음. ci=Ei,A<R>Ei,B 이형식으로 이미지 Ei,A와 Ei,B를 연결하고 caption한 descriptons C의 set를 구축함.

An immediate application of Relation Inversion is relation-specific text-to-image synthesis. Once the prompt is acquired, one can generate images with novel objects interacting with each other following the specified relation.

Relation Inversion의 immediate application은 relation t2i synthesis이다. prompt가 얻어지면 각각 특정 관계에 따라 새로운 object들 상호작용과 함게 이미지들을 생성할 수 있음.

This could potentially inspire future research in representation learning, fewshot learning, visual relation detection, scene graph generation, and many more.

fewshot learning, visual relation detection, representation learning 등에 미래 연구를 잠재적으로 고무 할 수 있음.

2. The ReVersion Framework

Preliminaries

Inversion on Text-to-Image Diffusion Models.