Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published 3 days ago • 29
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published 3 days ago • 29 • 2
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published 3 days ago • 29
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Paper • 2510.01171 • Published Oct 1 • 17
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Paper • 2510.06710 • Published 25 days ago • 36
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 219
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter Paper • 2407.11298 • Published Jul 16, 2024 • 6
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published 24 days ago • 45 • 2
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published 26 days ago • 134
ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter Paper • 2407.11298 • Published Jul 16, 2024 • 6
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 28
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published 24 days ago • 45
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published 24 days ago • 45
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning Paper • 2506.09049 • Published Jun 10 • 36