Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Paper • 2507.07484 • Published Jul 10 • 17
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Paper • 2507.07202 • Published Jul 9 • 22
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10 • 33
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published Jul 10 • 45
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding Paper • 2507.07984 • Published Jul 10 • 42
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published Jul 10 • 48
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness Paper • 2507.01702 • Published Jul 2 • 3
Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Paper • 2507.07106 • Published Jul 9 • 1
SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning Paper • 2505.10251 • Published May 15 • 3
Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework Paper • 2507.06260 • Published Jul 7 • 5
ModelCitizens: Representing Community Voices in Online Safety Paper • 2507.05455 • Published Jul 7 • 4
DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models Paper • 2507.06853 • Published Jul 9 • 7
A Survey on Vision-Language-Action Models for Autonomous Driving Paper • 2506.24044 • Published Jun 30 • 14
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published Jul 9 • 28
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published Jul 8 • 47
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation Paper • 2507.05578 • Published Jul 8 • 5