-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2506.07927
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 263 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 79 -
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Paper • 2507.06181 • Published • 43
-
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Mathesis: Towards Formal Theorem Proving from Natural Languages
Paper • 2506.07047 • Published • 5 -
Pre-trained Large Language Models Learn Hidden Markov Models In-context
Paper • 2506.07298 • Published • 26 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 79
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 34 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
-
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Mathesis: Towards Formal Theorem Proving from Natural Languages
Paper • 2506.07047 • Published • 5 -
Pre-trained Large Language Models Learn Hidden Markov Models In-context
Paper • 2506.07298 • Published • 26 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 79
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 263 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 34 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Solving Inequality Proofs with Large Language Models
Paper • 2506.07927 • Published • 20 -
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Paper • 2507.00432 • Published • 79 -
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Paper • 2507.06181 • Published • 43