run-title: micro-smollm2-135m model: micro-smollm2-135m base-model: HuggingFaceTB/SmolLM2-135M tokenizer: HuggingFaceTB/SmolLM2-135M-Instruct num-experts: 4 top-k-experts: 1 jitter-noise: 0 use-router: True mask-input: True max-length: 8192 gradient-checkpointing: False trainable: - model