andrewor14 commited on
Commit
a90ae0b
·
verified ·
1 Parent(s): 250f9c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -21,7 +21,6 @@ base_model:
21
  - **License:** apache-2.0
22
  - **Quantized from Model :** Qwen/Qwen3-8B
23
  - **Quantization Method :** QAT INT4
24
- - **Terms of Use**: [Terms][terms]
25
 
26
  [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) fine-tuned with [unsloth](https://github.com/unslothai/unsloth) using quantization-aware training (QAT) from [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao), and quantized with int4 weight only quantization, by PyTorch team.
27
  Use it directly or serve using [vLLM](https://docs.vllm.ai/en/latest/) for 62% VRAM reduction (6.24 GB needed) and 1.45x speedup on H100 GPUs.
 
21
  - **License:** apache-2.0
22
  - **Quantized from Model :** Qwen/Qwen3-8B
23
  - **Quantization Method :** QAT INT4
 
24
 
25
  [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) fine-tuned with [unsloth](https://github.com/unslothai/unsloth) using quantization-aware training (QAT) from [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao), and quantized with int4 weight only quantization, by PyTorch team.
26
  Use it directly or serve using [vLLM](https://docs.vllm.ai/en/latest/) for 62% VRAM reduction (6.24 GB needed) and 1.45x speedup on H100 GPUs.