Qwen3-8B-NVFP4

NVFP4-quantized version of Qwen/Qwen3-8B produced with llmcompressor.

Notes

  • Quantization scheme: NVFP4 (linear layers, lm_head excluded)
  • Calibration samples: 512
  • Max sequence length during calibration: 2048
Downloads last month
19
Safetensors
Model size
5B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for llmat/Qwen3-8B-NVFP4

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Quantized
(173)
this model