Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,6 @@ base_model:
|
|
| 21 |
- **License:** apache-2.0
|
| 22 |
- **Quantized from Model :** Qwen/Qwen3-8B
|
| 23 |
- **Quantization Method :** QAT INT4
|
| 24 |
-
- **Terms of Use**: [Terms][terms]
|
| 25 |
|
| 26 |
[Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) fine-tuned with [unsloth](https://github.com/unslothai/unsloth) using quantization-aware training (QAT) from [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao), and quantized with int4 weight only quantization, by PyTorch team.
|
| 27 |
Use it directly or serve using [vLLM](https://docs.vllm.ai/en/latest/) for 62% VRAM reduction (6.24 GB needed) and 1.45x speedup on H100 GPUs.
|
|
|
|
| 21 |
- **License:** apache-2.0
|
| 22 |
- **Quantized from Model :** Qwen/Qwen3-8B
|
| 23 |
- **Quantization Method :** QAT INT4
|
|
|
|
| 24 |
|
| 25 |
[Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) fine-tuned with [unsloth](https://github.com/unslothai/unsloth) using quantization-aware training (QAT) from [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao), and quantized with int4 weight only quantization, by PyTorch team.
|
| 26 |
Use it directly or serve using [vLLM](https://docs.vllm.ai/en/latest/) for 62% VRAM reduction (6.24 GB needed) and 1.45x speedup on H100 GPUs.
|