Model Details
- Base Model: meta-llama/Llama-3.1-8B-instruct
 - SFT
 
Datasets:
- 18k MMLU style 1 epoch
 - 5K Math from continuation of llama343
 
Source Adapters
All source adapters share the following configuration:
- Rank (r): 16
 - Alpha: 16
 - Target Modules:
- q_proj (Query projection)
 - k_proj (Key projection)
 - v_proj (Value projection)
 - o_proj (Output projection)
 - up_proj (Upsampling projection)
 - down_proj (Downsampling projection)
 - gate_proj (Gate projection)