Llama-3.1-8B-R1-Distill

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct using supervised fine-tuning (SFT) on the open-r1/Mixture-of-Thoughts dataset. The model has been trained using the Open-R1 library to replicate the step-by-step reasoning capabilities of DeepSeek-R1 distilled models.

Model Description

Base Model: meta-llama/Llama-3.1-8B-Instruct
Model Type: Causal Language Model
Language(s): English
License: Llama 3.1 Community License
Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

This model demonstrates strong performance across reasoning, mathematical problem-solving, scientific understanding, and code generation tasks. It has been specifically trained to think step-by-step using reasoning traces in a structured format with <think> and </think> tags.

Training Details

Training Data

The model was fine-tuned on the Mixture-of-Thoughts dataset, which contains 350k verified reasoning traces distilled from DeepSeek-R1. The dataset composition includes:

Mathematics: 93.7k reasoning traces for mathematical problems
Code: 83.1k reasoning traces for competitive programming problems (Python and C++)
Science: 173k reasoning traces for scientific problems

Training Procedure

Training Framework: Open-R1 library with TRL (Transformers Reinforcement Learning)
Training Type: Supervised Fine-Tuning (SFT)
Optimization: DeepSpeed ZeRO-2 with gradient checkpointing
Hardware: 8xNividia B200 Node
Precision: BFloat16
Learning Rate: 4.0e-5
Learning Rate Scheduler: Cosine with minimum learning rate
Training Epochs: 5
Training Tokens: 100B
Max Sequence Length: 32,768 tokens
Batch Size: 2 per device with gradient accumulation
Gradient Accumulation Steps: 8
Max Gradient Norm: 0.2
Warmup Ratio: 0.03

Training Configuration

# Key training parameters
model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
dataset_name: open-r1/Mixture-of-Thoughts
dataset_config: all
learning_rate: 4.0e-05
num_train_epochs: 5
max_length: 32768
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
bf16: true
gradient_checkpointing: true
use_liger_kernel: true

Performance

----Still in training----

Usage

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "your-username/Llama-3.1-8B-R1-Distill"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="flash_attention_2"
)

# Example: Mathematical reasoning
prompt = """Solve this step by step: A rectangle has a length that is 3 times its width. If the perimeter is 32 units, what are the dimensions?"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=500,
    temperature=0.1,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Structured Reasoning Format

This model is trained to use a structured reasoning format with <think> tags:

def format_reasoning_prompt(question, system_prompt=None):
    if system_prompt is None:
        system_prompt = "You are a helpful assistant that thinks step by step. Show your reasoning process within <think> tags before providing your final answer."
    
    return f"""<|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<think>
"""

# Example for coding problems
coding_prompt = format_reasoning_prompt(
    "Write a Python function to find the longest palindromic substring in a given string.",
    "You are an expert programmer. Think through the problem step by step, consider different approaches, and then provide a clean implementation."
)

inputs = tokenizer(coding_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=800, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Architecture

Architecture: Llama 3.1
Parameters: ~8 billion
Context Length: 128K tokens (inherited from base model)
Training Context: 32K tokens
Vocabulary Size: 128,256
Attention: Grouped Query Attention (GQA)
Activation: SwiGLU
Positional Encoding: RoPE (Rotary Position Embedding)

Capabilities

Mathematics

Step-by-step problem solving
Advanced mathematical reasoning (algebra, calculus, geometry)
Competition-level problems (AIME, IMO-style)
Statistical analysis and probability

Science

Physics problem solving
Chemistry calculations
Biology conceptual understanding
Graduate-level scientific reasoning

Programming

Code generation in Python and C++
Competitive programming problems
Algorithm design and optimization
Code explanation and debugging

General Reasoning

Logical reasoning and inference
Multi-step problem decomposition
Analytical thinking
Abstract reasoning

Limitations

Language: Primarily trained on English content
Knowledge Cutoff: Limited to training data knowledge cutoff
Reasoning Errors: May occasionally make logical errors in complex multi-step problems
Code Execution: Cannot execute code; provides code generation only
Real-time Information: No access to current information or internet
Domain Specificity: Best performance on math, science, and coding; may struggle with other specialized domains

Ethical Considerations

Bias: May reflect biases present in training data
Misuse: Should not be used for generating harmful, illegal, or malicious content
Academic Integrity: Users should be transparent about AI assistance in academic contexts
Verification: Important mathematical, scientific, or coding results should be independently verified
Professional Use: Should not replace professional expertise in critical applications

Training Infrastructure

Library: Open-R1
Total Training Tokens: 100B Tokens
Framework: PyTorch with Transformers and TRL
Optimization: DeepSpeed ZeRO-2
Memory Optimization: Gradient checkpointing, Liger kernels
Monitoring: Weights & Biases integration
Hardware Used: 8xB200 GPUs

Citation

If you use this model in your research, please cite:

@misc{llama31-r1-distill,
  title={Llama-3.1-8B-R1-Distill: A Step-by-Step Reasoning Model},
  author={[Your Name]},
  year={2025},
  url={https://huggingface.co/your-username/Llama-3.1-8B-R1-Distill}
}

@misc{openr1,
  title={Open R1: A fully open reproduction of DeepSeek-R1},
  url={https://github.com/huggingface/open-r1},
  author={Hugging Face},
  month={January},
  year={2025}
}

@misc{mixture-of-thoughts,
  title={Mixture-of-Thoughts},
  author={Hugging Face Open R1 Team},
  year={2025},
  url={https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts}
}

Acknowledgments

Base Model: Meta AI for Llama 3.1
Training Framework: Hugging Face for the Open-R1 library and TRL
Dataset: Open-R1 team for the Mixture-of-Thoughts dataset
Inspiration: DeepSeek AI for the original R1 reasoning approach

License

This model is released under the Llama 3.1 Community License. Please see the official license for terms and conditions.

Model Card Contact

For questions about this model card or the model itself, please open an issue in the model repository.

Downloads last month: 24

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for justinj92/Llama-3.1-8B-R1-Distill

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct