- MDETR -Modulated Detection for End-to-End Multi-Modal Understanding (16 Jul 2023)
This is my reading note for MDETR -Modulated Detection for End-to-End Multi-Modal Understanding. This paper proposes a method to learn object detection model from pairs of image and tree form text. The trained model is found to be capable of localizing unseen / long tail category.
- UNITER UNiversal Image-TExt Representation Learning (24 Jun 2023)
This is my reading note for UNITER: UNiversal Image-TExt Representation Learning. This paper proposes a vision language pre training model. The major innovation here is it studies the work region alignment loss as well as different mask region models task.