-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 82 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
Emmanuel Sugutt
Sugutt
AI & ML interests
Reinforcement learning
Transformer models
Organizations
MoE
-
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Paper • 2508.07785 • Published • 28 -
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Paper • 2508.05257 • Published • 13 -
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
Paper • 2507.20984 • Published • 56 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 92
RL
-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 82 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
MoE
-
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
Paper • 2508.07785 • Published • 28 -
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Paper • 2508.05257 • Published • 13 -
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment
Paper • 2507.20984 • Published • 56 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 92
models
8
Sugutt/whisper-kalenjin-large
Updated
Sugutt/whisper-small-hi
Updated
Sugutt/finmap-expense-cat-model
0.1B
•
Updated
•
1
Sugutt/finbert-expense-categorization
Text Classification
•
0.1B
•
Updated
•
1
Sugutt/Taxi-V3
Reinforcement Learning
•
Updated
Sugutt/q-FrozenLake-v1-4x4-noSlippery
Reinforcement Learning
•
Updated
Sugutt/ppo-Huggy
Reinforcement Learning
•
Updated
•
8
Sugutt/lunarlander
Reinforcement Learning
•
Updated
•
1
datasets
0
None public yet