ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? Paper • 2510.24591 • Published about 19 hours ago • 3
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published 1 day ago • 39
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published 1 day ago • 39 • 4
Rethinking Visual Intelligence: Insights from Video Pretraining Paper • 2510.24448 • Published about 21 hours ago • 3 • 1
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents Paper • 2510.24563 • Published about 20 hours ago • 18 • 1
ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? Paper • 2510.24591 • Published about 19 hours ago • 3 • 1
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published 2 days ago • 25
RobotArena infty: Scalable Robot Benchmarking via Real-to-Sim Translation Paper • 2510.23571 • Published 1 day ago • 8
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation Paper • 2510.23571 • Published 1 day ago • 8 • 1
ACG: Action Coherence Guidance for Flow-based VLA models Paper • 2510.22201 • Published 4 days ago • 35
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection Paper • 2510.23594 • Published 1 day ago • 5
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published 2 days ago • 14
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published 2 days ago • 14 • 1
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection Paper • 2510.23594 • Published 1 day ago • 5 • 1