- DreamBooth Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (31 Aug 2023)
This is my reading note on DreamBooth. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images.
- BLIP-Diffusion Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing (21 Aug 2023)
This is my reading note for BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. The paper proposes a method for generating an image with text prompt and target visual concept. To do that the paper trained blip model to align visual features with text prompt and then concatenate the visual embedding to the text prompt to generate the need. Code and models will be released at https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion. Project page at https://dxli94.github.io/BLIP-Diffusion-website/.
- Subject-Diffusion Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning (26 Jul 2023)
This is my reading note for Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning. This paper propose a diffusion method to generate images with given visual concepts and text prompt. Especially the paper is able to hand multiple visual concert jointly. To handle that, the paper detect the visual concepts from the input images, then the segmented images and bounding box are encoded feed into latent diffusion model. To enhance the consistency, the visual embedding is inserted into the text encode of the prompt.
- DreamBooth Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (14 Jul 2023)
This is my reading note for DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. This paper proposes a personalized method for text to image based on diffusion. To achieve this, it firsts learn to align the visual content to be personalized to a rarely used text embedding, then this text embedding will be insert to the text to control the image generation.