Tag: mugen
- MUGEN A Playground for Video-Audio-Text Multimodal Understanding and GENeration (16 Jul 2023)
This is my reading note for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration. In this paper, we introduce MUGEN, a large-scale controllable video-audio- text dataset with rich annotations for multimodal understanding and generation.