Tag: reinforcement

Aligning Large Multimodal Models with Factually Augmented RLHF (02 Oct 2023)

This is my reading note for Aligning Large Multimodal Models with Factually Augmented RLHF. This paper discusses how to mitigate hallucination for large multimodal model.it proposes two methods, 1) add additional human labeled data to train a reward model to guide the fine tune of the final model: 2) add additional factual data to the reward model besides model’s response.
Tool Learning with Foundation Models (26 Aug 2023)

This is my read note on Tool Learning with Foundation Models This is a nice review paper on how to use LLM with external tool to perform different tasks.
Aligning Text-to-Image Diffusion Models with Reward Backpropagation (10 Jul 2023)

This is my reading note for Aligning Text-to-Image Diffusion Models with Reward Backpropagation. This paper proposes a method how to train diffusion model for a given reward function in a memory efficient way, especially it utilities Lora and checkpoints . To avoid model collapse, it also proposes to randomly truncate number of steps.