BARD | Notion

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

This paper proposes a novel approach to personalized text-to-image generation that does not require any test-time finetuning. The proposed approach, called InstantBooth, is built upon a pre-trained text-to-image model and consists of two main components: a learnable image encoder and a few adapter layers. The image encoder is used to learn the general concept of the input images, while the adapter layers are used to learn rich visual feature representations. The proposed approach is trained on text-image pairs without using paired images of the same concept. Experiments show that InstantBooth can generate competitive results on unseen concepts concerning language-image alignment, image fidelity, and identity preservation while being 100 times faster than existing test-time finetuning-based methods.

Key insights and lessons learned from the paper:

The proposed approach is able to generate personalized images without any test-time finetuning.
The proposed approach is able to generate images that are faithful to the input text and that preserve the identity of the input images.
The proposed approach is able to generate images quickly, which makes it suitable for real-time applications.

Questions for the authors:

How does the proposed approach compare to other methods for personalized text-to-image generation?
What are the limitations of the proposed approach?
How can the proposed approach be improved?
What are the potential applications of the proposed approach?

Related topics or future research directions:

Other methods for personalized text-to-image generation
Limitations of the proposed approach
Ways to improve the proposed approach
Potential applications of the proposed approach

References:

[1] "DreamBooth: Personalized Text-to-Image Generation with Dream Transformer"
[2] "Textual-Inversion: Personalized Text-to-Image Generation with Textual Inversion"
[3] "InstructPix2Pix: Text-Guided Image Generation by Instructing Image Inversion"