Sesame Model - checkpoint-3000
This is a checkpoint of the Sesame multimodal model trained on speech and text data.
Model Architecture
- Large Transformer: meta-llama/Llama-3.2-1B
- Small Transformer: HuggingFaceTB/SmolLM-135M-Instruct
- RVQ Size: 8
- Vocabulary Size: 173066
- Hidden Size: 576
Training Configuration
- Learning Rate: 0.0001
- Batch Size: 1
- Weight Decay: 0.01
- Small Loss Weight: 0.875
Usage
from model_batched_accelerate import Sesame_Model
from safetensors.torch import load_file
# Load the model
model = Sesame_Model()
model_weights = load_file("model.safetensors")
model.load_state_dict(model_weights, strict=False)
model.eval()
Dataset
This model was trained on the jamessaker/dc-enc-em_ext_100k dataset.
Checkpoint Information
This is checkpoint checkpoint-3000 from the training run.
- Downloads last month
- 12