Upload AWQ 4-bit quantized Molmo-7B-D (~5.2GB, 63.0% reduction)
Browse files
README.md
CHANGED
|
@@ -36,7 +36,7 @@ This is a 4-bit AWQ quantized version of [allenai/Molmo-7B-D-0924](https://huggi
|
|
| 36 |
- **Architecture:** Molmo (Qwen2-7B decoder + OpenAI CLIP vision encoder)
|
| 37 |
- **Quantization Method:** AWQ (Activation-aware Weight Quantization)
|
| 38 |
- **Quantization Scheme:** W4A16 (4-bit weights, 16-bit activations)
|
| 39 |
-
- **Calibration Dataset:** Flickr30k (
|
| 40 |
|
| 41 |
## Size Comparison
|
| 42 |
|
|
@@ -150,8 +150,8 @@ Molmo-7B-D is part of the Molmo family of open vision-language models developed
|
|
| 150 |
|
| 151 |
- **Method:** AWQ (Activation-aware Weight Quantization)
|
| 152 |
- **Independent Pipeline:** Used with BasicPipeline for layer-by-layer quantization
|
| 153 |
-
- **Calibration:**
|
| 154 |
-
- **Max Sequence Length:**
|
| 155 |
- **Why AWQ**: Activation-aware quantization preserves important weights
|
| 156 |
|
| 157 |
## Limitations
|
|
|
|
| 36 |
- **Architecture:** Molmo (Qwen2-7B decoder + OpenAI CLIP vision encoder)
|
| 37 |
- **Quantization Method:** AWQ (Activation-aware Weight Quantization)
|
| 38 |
- **Quantization Scheme:** W4A16 (4-bit weights, 16-bit activations)
|
| 39 |
+
- **Calibration Dataset:** Flickr30k (128 samples)
|
| 40 |
|
| 41 |
## Size Comparison
|
| 42 |
|
|
|
|
| 150 |
|
| 151 |
- **Method:** AWQ (Activation-aware Weight Quantization)
|
| 152 |
- **Independent Pipeline:** Used with BasicPipeline for layer-by-layer quantization
|
| 153 |
+
- **Calibration:** 128 Flickr30k image-text pairs
|
| 154 |
+
- **Max Sequence Length:** 2048 tokens
|
| 155 |
- **Why AWQ**: Activation-aware quantization preserves important weights
|
| 156 |
|
| 157 |
## Limitations
|