Recommended vLLM setting?

by davidheineman - opened 4 days ago

Discussion

davidheineman

4 days ago

•

edited 4 days ago

Hello! Congrats on the release!! Really excited to try the model.

Is there a recommended setup for vLLM? For example:

from vllm import LLM, SamplingParams

llm = LLM(model="marin-community/marin-32b-base")

prompts = [
    "We may have knowledge of the past but cannot control it; we may control the future but"
]

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=128,
)

outputs = llm.generate(prompts, sampling_params)

for i, output in enumerate(outputs):
    print(prompts[i])
    print(output.outputs[0].text.strip())

I've tried the script, installing with:

# current vLLM
pip install vllm==0.11.0

# nightly vLLM
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

In both cases, this script will fail with the error (note, I'm using the v1 engine):

...
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631]   File "/oe-eval-default/davidh/marindebug/.venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 503, in load_weights
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631]     param = params_dict[name]
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631]             ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=11777) (Worker_TP0 pid=11783) ERROR 10-29 17:44:30 [multiproc_executor.py:631] KeyError: 'layers.0.self_attn.k_norm.weight'

Finally, when trying the install with an earlier version vllm==0.9.0.1, I'm seeing this slightly different loading error:

2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/llama.py", line 601, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     return loader.load_weights(
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]            ^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 291, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     autoloaded_weights = set(self._load_module("", self.module, weights))
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 249, in _load_module
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     yield from self._load_module(prefix,
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/utils.py", line 222, in _load_module
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     loaded_params = module_load_weights(weights)
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]   File "/stage/src/vllm/vllm/model_executor/models/llama.py", line 465, in load_weights
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]     param = params_dict[name]
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487]             ~~~~~~~~~~~^^^^^^
2025-10-29T23:29:33.291Z (VllmWorker rank=0 pid=487) ERROR 10-29 23:29:33 [multiproc_executor.py:487] KeyError: 'layers.60.self_attn.k_norm.weight'

Edit: I also see the same problem with nightly transformers 5.0.0 (which I installed using this), transformers==4.57.1 (current main) and transformers==4.55.4 (listed in the config.json):

git clone https://github.com/huggingface/transformers.git
cd transformers

# install nightly transformers
pip install '.[torch]'

# install other versions
pip install transformers==4.55.4
pip install transformers==4.57.1

WillHeld

The Marin Project org 4 days ago

•

edited 4 days ago

Hi @davidheineman !

Sorry for that, our export logic still had "LlamaForCausalLM" in the config.json (since we use https://github.com/marin-community/levanter for our evals so we can use TPUs, I had missed this). Trying on the newest revision, which correctly indicates to HF to load a Qwen3 architecture, should correct this! I'll test on one of the GPU machines I have access to now to double check, but let me know if you hit further issues.

davidheineman

4 days ago

Everything is working on my end now. Thanks Will!!

Marin gave a great quote from Pulp Fiction in response to that prompt

We may have knowledge of the past but cannot control it; we may control the future but
have no knowledge of it. Now we control the present but neither know nor control the future. (Tertullian, 2nd Century)

The path of the righteous man is beset on all sides by the inequities of the selfish and the tyranny of evil men. Blessed is he who, in the name of charity and goodwill, shepherds the weak through the valley of darkness, for he is truly his brother's keeper and the finder of lost children. And I will strike down upon thee with great vengeance and furious anger those who attempt to poison and destroy my brothers. And you will know I am the Lord when

davidheineman changed discussion status to closed 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment