Text-to-text Generation Models (LLMs, Llama, GPT, ...)
					Collection
				
				5266 items
				• 
				Updated
					
				•
					
					15
Frequently Asked Questions
Getting started with DBRX models is easy with the transformers library. The model requires ~264GB of RAM and the following packages:
pip install "torch==2.4.0" "transformers>=4.39.2" "tiktoken>=0.6.0" "bitsandbytes"
If you'd like to speed up download time, you can use the hf_transfer package as described by Huggingface here.
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
You will need to request access to this repository to download the model. Once this is granted, 
obtain an access token with read permission, and supply the token below.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN")
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
The license of the smashed model follows the license of the original model. Please check the license of the original model databricks/dbrx-instruct before using this model which provided the base model. The license  of the pruna-engine is here on Pypi.