Are the F16 weights upcasted MXFP4? -- Why no `gpt-oss-20b-MXFP4.gguf`?
Follow up question to https://huggingface.co/unsloth/gpt-oss-20b-GGUF/discussions/14 and https://huggingface.co/unsloth/gpt-oss-20b-GGUF/discussions/7:
Are the F16 weights maybe just upcasted MXFP4 ones, or why is bartowski recommending to use gpt-oss-20b-MXFP4.gguf (12.1 GB):
Use this one:
gpt-oss-20b-MXFP4.gguf
The reason is, the FFN (feed forward networks) of gpt-oss do not behave nicely when quantized to anything other than MXFP4, so they are kept at that level for everything.
, everyone in https://github.com/ggml-org/llama.cpp/discussions/15396 is also testing only the gpt-oss-20b-MXFP4.gguf and just another example, lmstudio-community also only has the gpt-oss-20b-MXFP4.gguf?
Yes, I think only using MXFP4.gguf is the way to go with gpt oss. Unsloth GGUFs aren't applicable AFAIK in this model.
I think they have anyway made all their GGUFs for a completion's sake. And perhaps the quantization under Q4 also has value for people without enough vram. But if you can use Q4, it only makes sense to use the standard *-MXFP4.ggufs.