gpt-oss-20b-GGUF / README.md
giladgd's picture
Update README.md
80a47bc verified
---
license: apache-2.0
pipeline_tag: text-generation
library_name: node-llama-cpp
tags:
- node-llama-cpp
- llama.cpp
- conversational
quantized_by: giladgd
base_model: openai/gpt-oss-20b
---
# gpt-oss-20b-GGUF
> [!NOTE]
> Read [our guide](https://node-llama-cpp.withcat.ai/blog/v3.12-gpt-oss) on using `gpt-oss` to learn how to adjust its responses
<p align="center">
<img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
</p>
# Highlights
* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning.
* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.
* **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory.
> [!NOTE]
> Refer to the [original model card](https://huggingface.co/openai/gpt-oss-20b) for more details on the model
# Quants
| Link | [URI](https://node-llama-cpp.withcat.ai/cli/pull) | Size |
|:-----|:--------------------------------------------------|-----:|
| [GGUF](https://huggingface.co/giladgd/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b.MXFP4.gguf) | `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf` | 12.1GB |
| [GGUF](https://huggingface.co/giladgd/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b.F16.gguf) | `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.F16.gguf` | 13.8GB |
> [!TIP]
> Download a quant using `node-llama-cpp` ([more info](https://node-llama-cpp.withcat.ai/cli/pull)):
> ```bash
> npx -y node-llama-cpp pull <URI>
> ```
# Usage
## Use with [`node-llama-cpp`](https://node-llama-cpp.withcat.ai) (recommended)
### CLI
Chat with the model:
```bash
npx -y node-llama-cpp chat hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
```
> [!NOTE]
> Ensure that you have `node.js` installed first:
> ```bash
> brew install nodejs
> ```
### Code
Use it in your node.js project:
```bash
npm install node-llama-cpp
```
```typescript
import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp";
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence()
});
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await session.prompt(q1);
console.log("AI: " + a1);
```
> [!TIP]
> Read the [getting started guide](https://node-llama-cpp.withcat.ai/guide/) to quickly scaffold a new `node-llama-cpp` project
#### Customize inference options
Set [Harmoy](https://cookbook.openai.com/articles/openai-harmony) options using [`HarmonyChatWrapper`](https://node-llama-cpp.withcat.ai/api/classes/HarmonyChatWrapper):
```typescript
import {
getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper,
defineChatSessionFunction
} from "node-llama-cpp";
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
const llama = await getLlama();
const model = await llama.loadModel({
modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
chatWrapper: new HarmonyChatWrapper({
modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
reasoningEffort: "high"
})
});
const functions = {
getCurrentWeather: defineChatSessionFunction({
description: "Gets the current weather in the provided location.",
params: {
type: "object",
properties: {
location: {
type: "string",
description: "The city and state, e.g. San Francisco, CA"
},
format: {
enum: ["celsius", "fahrenheit"]
}
}
},
handler({location, format}) {
console.log(`Getting current weather for "${location}" in ${format}`);
return {
// simulate a weather API response
temperature: format === "celsius" ? 20 : 68,
format
};
}
})
};
const q1 = "What is the weather like in SF?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
```
## Use with [llama.cpp](https://github.com/ggml-org/llama.cpp)
Install llama.cpp through brew (works on Mac and Linux)
```bash
brew install llama.cpp
```
### CLI
```bash
llama-cli --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -p "The meaning to life and the universe is"
```
### Server
```bash
llama-server --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -c 2048
```