Llama-3.1-8B-R1-Distill
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct using supervised fine-tuning (SFT) on the open-r1/Mixture-of-Thoughts dataset. The model has been trained using the Open-R1 library to replicate the step-by-step reasoning capabilities of DeepSeek-R1 distilled models.
Model Description
- Base Model: meta-llama/Llama-3.1-8B-Instruct
- Model Type: Causal Language Model
- Language(s): English
- License: Llama 3.1 Community License
- Finetuned from model: meta-llama/Llama-3.1-8B-Instruct
This model demonstrates strong performance across reasoning, mathematical problem-solving, scientific understanding, and code generation tasks. It has been specifically trained to think step-by-step using reasoning traces in a structured format with <think> and </think> tags.
Training Details
Training Data
The model was fine-tuned on the Mixture-of-Thoughts dataset, which contains 350k verified reasoning traces distilled from DeepSeek-R1. The dataset composition includes:
- Mathematics: 93.7k reasoning traces for mathematical problems
- Code: 83.1k reasoning traces for competitive programming problems (Python and C++)
- Science: 173k reasoning traces for scientific problems
Training Procedure
- Training Framework: Open-R1 library with TRL (Transformers Reinforcement Learning)
- Training Type: Supervised Fine-Tuning (SFT)
- Optimization: DeepSpeed ZeRO-2 with gradient checkpointing
- Hardware: 8xNividia B200 Node
- Precision: BFloat16
- Learning Rate: 4.0e-5
- Learning Rate Scheduler: Cosine with minimum learning rate
- Training Epochs: 5
- Training Tokens: 100B
- Max Sequence Length: 32,768 tokens
- Batch Size: 2 per device with gradient accumulation
- Gradient Accumulation Steps: 8
- Max Gradient Norm: 0.2
- Warmup Ratio: 0.03
Training Configuration
# Key training parameters
model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
dataset_name: open-r1/Mixture-of-Thoughts
dataset_config: all
learning_rate: 4.0e-05
num_train_epochs: 5
max_length: 32768
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
bf16: true
gradient_checkpointing: true
use_liger_kernel: true
Performance
----Still in training----
Usage
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "your-username/Llama-3.1-8B-R1-Distill"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="flash_attention_2"
)
# Example: Mathematical reasoning
prompt = """Solve this step by step: A rectangle has a length that is 3 times its width. If the perimeter is 32 units, what are the dimensions?"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=500,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Structured Reasoning Format
This model is trained to use a structured reasoning format with <think> tags:
def format_reasoning_prompt(question, system_prompt=None):
if system_prompt is None:
system_prompt = "You are a helpful assistant that thinks step by step. Show your reasoning process within <think> tags before providing your final answer."
return f"""<|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
<think>
"""
# Example for coding problems
coding_prompt = format_reasoning_prompt(
"Write a Python function to find the longest palindromic substring in a given string.",
"You are an expert programmer. Think through the problem step by step, consider different approaches, and then provide a clean implementation."
)
inputs = tokenizer(coding_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=800, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Model Architecture
- Architecture: Llama 3.1
- Parameters: ~8 billion
- Context Length: 128K tokens (inherited from base model)
- Training Context: 32K tokens
- Vocabulary Size: 128,256
- Attention: Grouped Query Attention (GQA)
- Activation: SwiGLU
- Positional Encoding: RoPE (Rotary Position Embedding)
Capabilities
Mathematics
- Step-by-step problem solving
- Advanced mathematical reasoning (algebra, calculus, geometry)
- Competition-level problems (AIME, IMO-style)
- Statistical analysis and probability
Science
- Physics problem solving
- Chemistry calculations
- Biology conceptual understanding
- Graduate-level scientific reasoning
Programming
- Code generation in Python and C++
- Competitive programming problems
- Algorithm design and optimization
- Code explanation and debugging
General Reasoning
- Logical reasoning and inference
- Multi-step problem decomposition
- Analytical thinking
- Abstract reasoning
Limitations
- Language: Primarily trained on English content
- Knowledge Cutoff: Limited to training data knowledge cutoff
- Reasoning Errors: May occasionally make logical errors in complex multi-step problems
- Code Execution: Cannot execute code; provides code generation only
- Real-time Information: No access to current information or internet
- Domain Specificity: Best performance on math, science, and coding; may struggle with other specialized domains
Ethical Considerations
- Bias: May reflect biases present in training data
- Misuse: Should not be used for generating harmful, illegal, or malicious content
- Academic Integrity: Users should be transparent about AI assistance in academic contexts
- Verification: Important mathematical, scientific, or coding results should be independently verified
- Professional Use: Should not replace professional expertise in critical applications
Training Infrastructure
- Library: Open-R1
- Total Training Tokens: 100B Tokens
- Framework: PyTorch with Transformers and TRL
- Optimization: DeepSpeed ZeRO-2
- Memory Optimization: Gradient checkpointing, Liger kernels
- Monitoring: Weights & Biases integration
- Hardware Used: 8xB200 GPUs
Citation
If you use this model in your research, please cite:
@misc{llama31-r1-distill,
title={Llama-3.1-8B-R1-Distill: A Step-by-Step Reasoning Model},
author={[Your Name]},
year={2025},
url={https://huggingface.co/your-username/Llama-3.1-8B-R1-Distill}
}
@misc{openr1,
title={Open R1: A fully open reproduction of DeepSeek-R1},
url={https://github.com/huggingface/open-r1},
author={Hugging Face},
month={January},
year={2025}
}
@misc{mixture-of-thoughts,
title={Mixture-of-Thoughts},
author={Hugging Face Open R1 Team},
year={2025},
url={https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts}
}
Acknowledgments
- Base Model: Meta AI for Llama 3.1
- Training Framework: Hugging Face for the Open-R1 library and TRL
- Dataset: Open-R1 team for the Mixture-of-Thoughts dataset
- Inspiration: DeepSeek AI for the original R1 reasoning approach
License
This model is released under the Llama 3.1 Community License. Please see the official license for terms and conditions.
Model Card Contact
For questions about this model card or the model itself, please open an issue in the model repository.
- Downloads last month
- 24