bpe tokenizer w byte-fallback: 32k vocab, uncased
uncased BPE tokenizer for encoders/MLM objective with byte-pair fallback:
- Trained on
pints-ai/Expository-Prose-V1; this tokenizer is primarily for English and code. - this tokenizer is uncased: "HELLO WORLD" is the same as "hello world"
model_max_lengthis set to 1e9 to not cause hidden issues. Settokenizer.model_max_lengthto your model's max position embeddings when training.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support