Tag: word-patch-alignment
- SimVLM Simple Visual Language Model Pretraining with Weak Supervision (07 Aug 2023)
This is my reading note for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision. SimVLM reduces the training complexity by exploiting large-scale weak supervision, and is trained end-to-end with a single prefix language modeling objective