Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 27 days ago • 462
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 45
view reply Yeah! And it does work on older GPUs via MLX/ Llama.cpp as well :) Waiting for a vLLM implementation in bf16.