At the moment these are all converted with 8-bpw output layers. Currently investigating why there's a small-but-noticeable drop in accuracy at 6-bpw. Likely it has to do with the tied embeddings.

2.00 bits per weight / H8
2.25 bits per weight / H8
2.50 bits per weight / H8
3.00 bits per weight / H8
3.50 bits per weight / H8
4.00 bits per weight / H8
5.00 bits per weight / H8
6.00 bits per weight / H8
8.00 bits per weight / H8

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for turboderp/Phi-4-mini-instruct-exl3

Base model

microsoft/Phi-4-mini-instruct

Quantized

(109)

this model

Collection including turboderp/Phi-4-mini-instruct-exl3

EXL3 models

Collection

33 items • Updated 7 days ago • 38