Sesame Model - checkpoint-3000

This is a checkpoint of the Sesame multimodal model trained on speech and text data.

Model Architecture

Large Transformer: meta-llama/Llama-3.2-1B
Small Transformer: HuggingFaceTB/SmolLM-135M-Instruct
RVQ Size: 8
Vocabulary Size: 173066
Hidden Size: 576

Training Configuration

Learning Rate: 0.0001
Batch Size: 1
Weight Decay: 0.01
Small Loss Weight: 0.875

Usage

from model_batched_accelerate import Sesame_Model
from safetensors.torch import load_file

# Load the model
model = Sesame_Model()
model_weights = load_file("model.safetensors")
model.load_state_dict(model_weights, strict=False)
model.eval()

Dataset

This model was trained on the jamessaker/dc-enc-em_ext_100k dataset.

Checkpoint Information

This is checkpoint checkpoint-3000 from the training run.

Downloads last month: 12

Safetensors

Model size

2B params

Tensor type

F32

BF16