Tag: dreambooth

360 Reconstruction From a Single Image Using Space Carved Outpainting (19 Sep 2023)

This is my reading note for 360 Reconstruction From a Single Image Using Space Carved Outpainting. This paper proposes a method of 3D reconstruction from a single image. To the it represents the 3D object by NERF and iteratively update the NERF by rendering new view using Dream booth.
Key-Locked Rank One Editing for Text-to-Image Personalization (07 Sep 2023)

This is my reading note on Key-Locked Rank One Editing for Text-to-Image Personalization. This paper proposes a personalized image generation method base on controlling attention module of the diffusion model. Especially key captures the layout of concept and value captures the identity of the new concept. A rank one update is applied to the attention weight to this purpose.
DreamBooth Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (31 Aug 2023)

This is my reading note on DreamBooth. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images.
StableVideo Text-driven Consistency-aware Diffusion Video Editing (22 Aug 2023)

This is my reading note on StableVideo: Text-driven Consistency-aware Diffusion Video Editing. This paper proposes a video editing method based on diffusion. To ensure temporal consistency, the method utilizes neural atlas and inter frame interpolation. The neural atlas separate the videos into foreground and background plane. The lattes defines the mapping of pixel in frame to u v coordinate in atlas. For inter frame interpolation, the edited imago from diffusion is mapping to next frame via atlas, which is then use as initial to denote to the final contents of this frame.
BLIP-Diffusion Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing (21 Aug 2023)

This is my reading note for BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. The paper proposes a method for generating an image with text prompt and target visual concept. To do that the paper trained blip model to align visual features with text prompt and then concatenate the visual embedding to the text prompt to generate the need. Code and models will be released at https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion. Project page at https://dxli94.github.io/BLIP-Diffusion-website/.
HyperDreamBooth HyperNetworks for Fast Personalization of Text-to-Image Models (27 Jul 2023)

This is my reading note for HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models. This paper improves DreamBooth by applying LORA to improve speed.
Subject-Diffusion Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning (26 Jul 2023)

This is my reading note for Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning. This paper propose a diffusion method to generate images with given visual concepts and text prompt. Especially the paper is able to hand multiple visual concert jointly. To handle that, the paper detect the visual concepts from the input images, then the segmented images and bounding box are encoded feed into latent diffusion model. To enhance the consistency, the visual embedding is inserted into the text encode of the prompt.
DreamBooth Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (28 Sep 2022)

This is my reading note on DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. Given as input just a few (3~5) images of a subject, DreamBooth fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, DreamBooth enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. (check Figure 1 as an example)