Sesame Model - checkpoint-3000

This is a checkpoint of the Sesame multimodal model trained on speech and text data.

Model Architecture

  • Large Transformer: meta-llama/Llama-3.2-1B
  • Small Transformer: HuggingFaceTB/SmolLM-135M-Instruct
  • RVQ Size: 8
  • Vocabulary Size: 173066
  • Hidden Size: 576

Training Configuration

  • Learning Rate: 0.0001
  • Batch Size: 1
  • Weight Decay: 0.01
  • Small Loss Weight: 0.875

Usage

from model_batched_accelerate import Sesame_Model
from safetensors.torch import load_file

# Load the model
model = Sesame_Model()
model_weights = load_file("model.safetensors")
model.load_state_dict(model_weights, strict=False)
model.eval()

Dataset

This model was trained on the jamessaker/dc-enc-em_ext_100k dataset.

Checkpoint Information

This is checkpoint checkpoint-3000 from the training run.

Downloads last month
12
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support