R-WoM: Retrieval-augmented World Model For Computer-use Agents Paper • 2510.11892 • Published 20 days ago • 21
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 96
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper • 2406.12649 • Published Jun 18, 2024 • 16
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper • 2406.12649 • Published Jun 18, 2024 • 16 • 1
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 33
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper • 2406.12649 • Published Jun 18, 2024 • 16
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 33
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 33 • 1