Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper β’ 2510.25992 β’ Published 4 days ago β’ 27
Running on CPU Upgrade 898 898 The Smol Training Playbook: The Secrets to Building World-Class LLMs π
view article Article 3+ Years of ML & Society at Hugging Face π€π€π§βπ€βπ§ By yjernite and 3 others β’ 4 days ago β’ 13
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning 7 days ago β’ 51
Running on CPU Upgrade 898 898 The Smol Training Playbook: The Secrets to Building World-Class LLMs π
Running on CPU Upgrade 898 898 The Smol Training Playbook: The Secrets to Building World-Class LLMs π
view article Article Aligning to What? Rethinking Agent Generalization in MiniMax M2 By MiniMax-AI β’ 3 days ago β’ 15
gpt-oss-safeguard Collection gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss β’ 2 items β’ Updated 4 days ago β’ 51
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper β’ 2402.12030 β’ Published Feb 19, 2024 β’ 3
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper β’ 2307.09288 β’ Published Jul 18, 2023 β’ 246
Huxley-GΓΆdel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Paper β’ 2510.21614 β’ Published 9 days ago β’ 17