|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
library_name: node-llama-cpp |
|
|
tags: |
|
|
- node-llama-cpp |
|
|
- llama.cpp |
|
|
- conversational |
|
|
quantized_by: giladgd |
|
|
base_model: openai/gpt-oss-20b |
|
|
--- |
|
|
|
|
|
# gpt-oss-20b-GGUF |
|
|
|
|
|
> [!NOTE] |
|
|
> Read [our guide](https://node-llama-cpp.withcat.ai/blog/v3.12-gpt-oss) on using `gpt-oss` to learn how to adjust its responses |
|
|
|
|
|
<p align="center"> |
|
|
<img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg"> |
|
|
</p> |
|
|
|
|
|
# Highlights |
|
|
|
|
|
* **Permissive Apache 2.0 license:** Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. |
|
|
* **Configurable reasoning effort:** Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. |
|
|
* **Full chain-of-thought:** Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. |
|
|
* **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. |
|
|
* **Agentic capabilities:** Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs. |
|
|
* **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory. |
|
|
|
|
|
> [!NOTE] |
|
|
> Refer to the [original model card](https://huggingface.co/openai/gpt-oss-20b) for more details on the model |
|
|
|
|
|
# Quants |
|
|
| Link | [URI](https://node-llama-cpp.withcat.ai/cli/pull) | Size | |
|
|
|:-----|:--------------------------------------------------|-----:| |
|
|
| [GGUF](https://huggingface.co/giladgd/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b.MXFP4.gguf) | `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf` | 12.1GB | |
|
|
| [GGUF](https://huggingface.co/giladgd/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b.F16.gguf) | `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.F16.gguf` | 13.8GB | |
|
|
|
|
|
> [!TIP] |
|
|
> Download a quant using `node-llama-cpp` ([more info](https://node-llama-cpp.withcat.ai/cli/pull)): |
|
|
> ```bash |
|
|
> npx -y node-llama-cpp pull <URI> |
|
|
> ``` |
|
|
|
|
|
|
|
|
# Usage |
|
|
## Use with [`node-llama-cpp`](https://node-llama-cpp.withcat.ai) (recommended) |
|
|
### CLI |
|
|
Chat with the model: |
|
|
```bash |
|
|
npx -y node-llama-cpp chat hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf |
|
|
``` |
|
|
|
|
|
> [!NOTE] |
|
|
> Ensure that you have `node.js` installed first: |
|
|
> ```bash |
|
|
> brew install nodejs |
|
|
> ``` |
|
|
|
|
|
### Code |
|
|
Use it in your node.js project: |
|
|
```bash |
|
|
npm install node-llama-cpp |
|
|
``` |
|
|
|
|
|
```typescript |
|
|
import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp"; |
|
|
|
|
|
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf"; |
|
|
|
|
|
|
|
|
const llama = await getLlama(); |
|
|
const model = await llama.loadModel({ |
|
|
modelPath: await resolveModelFile(modelUri) |
|
|
}); |
|
|
const context = await model.createContext(); |
|
|
const session = new LlamaChatSession({ |
|
|
contextSequence: context.getSequence() |
|
|
}); |
|
|
|
|
|
|
|
|
const q1 = "Hi there, how are you?"; |
|
|
console.log("User: " + q1); |
|
|
|
|
|
const a1 = await session.prompt(q1); |
|
|
console.log("AI: " + a1); |
|
|
``` |
|
|
|
|
|
> [!TIP] |
|
|
> Read the [getting started guide](https://node-llama-cpp.withcat.ai/guide/) to quickly scaffold a new `node-llama-cpp` project |
|
|
|
|
|
#### Customize inference options |
|
|
Set [Harmoy](https://cookbook.openai.com/articles/openai-harmony) options using [`HarmonyChatWrapper`](https://node-llama-cpp.withcat.ai/api/classes/HarmonyChatWrapper): |
|
|
```typescript |
|
|
import { |
|
|
getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper, |
|
|
defineChatSessionFunction |
|
|
} from "node-llama-cpp"; |
|
|
|
|
|
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf"; |
|
|
|
|
|
|
|
|
const llama = await getLlama(); |
|
|
const model = await llama.loadModel({ |
|
|
modelPath: await resolveModelFile(modelUri) |
|
|
}); |
|
|
const context = await model.createContext(); |
|
|
const session = new LlamaChatSession({ |
|
|
contextSequence: context.getSequence(), |
|
|
chatWrapper: new HarmonyChatWrapper({ |
|
|
modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.", |
|
|
reasoningEffort: "high" |
|
|
}) |
|
|
}); |
|
|
|
|
|
const functions = { |
|
|
getCurrentWeather: defineChatSessionFunction({ |
|
|
description: "Gets the current weather in the provided location.", |
|
|
params: { |
|
|
type: "object", |
|
|
properties: { |
|
|
location: { |
|
|
type: "string", |
|
|
description: "The city and state, e.g. San Francisco, CA" |
|
|
}, |
|
|
format: { |
|
|
enum: ["celsius", "fahrenheit"] |
|
|
} |
|
|
} |
|
|
}, |
|
|
handler({location, format}) { |
|
|
console.log(`Getting current weather for "${location}" in ${format}`); |
|
|
|
|
|
return { |
|
|
// simulate a weather API response |
|
|
temperature: format === "celsius" ? 20 : 68, |
|
|
format |
|
|
}; |
|
|
} |
|
|
}) |
|
|
}; |
|
|
|
|
|
const q1 = "What is the weather like in SF?"; |
|
|
console.log("User: " + q1); |
|
|
|
|
|
const a1 = await session.prompt(q1, {functions}); |
|
|
console.log("AI: " + a1); |
|
|
``` |
|
|
|
|
|
|
|
|
## Use with [llama.cpp](https://github.com/ggml-org/llama.cpp) |
|
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
|
|
```bash |
|
|
brew install llama.cpp |
|
|
``` |
|
|
|
|
|
### CLI |
|
|
```bash |
|
|
llama-cli --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -p "The meaning to life and the universe is" |
|
|
``` |
|
|
|
|
|
### Server |
|
|
```bash |
|
|
llama-server --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -c 2048 |
|
|
``` |
|
|
|