Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9 • 83
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12, 2024 • 23