gpt-oss-20b-GGUF / README.md

Update README.md

80a47bc verified 11 days ago

5.7 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	library_name: node-llama-cpp
	tags:
	- node-llama-cpp
	- llama.cpp
	- conversational
	quantized_by: giladgd
	base_model: openai/gpt-oss-20b
	---

	# gpt-oss-20b-GGUF

	> [!NOTE]
	> Read [our guide](https://node-llama-cpp.withcat.ai/blog/v3.12-gpt-oss) on using `gpt-oss` to learn how to adjust its responses

	<p align="center">
	<img alt="gpt-oss-20b" src="https://raw.githubusercontent.com/openai/gpt-oss/main/docs/gpt-oss-20b.svg">
	</p>

	# Highlights

	* Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
	* Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
	* Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
	* Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
	* Agentic capabilities: Use the models’ native capabilities for function calling, [web browsing](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#browser), [Python code execution](https://github.com/openai/gpt-oss/tree/main?tab=readme-ov-file#python), and Structured Outputs.
	* Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making `gpt-oss-120b` run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the `gpt-oss-20b` model run within 16GB of memory.

	> [!NOTE]
	> Refer to the [original model card](https://huggingface.co/openai/gpt-oss-20b) for more details on the model

	# Quants
	\| Link \| [URI](https://node-llama-cpp.withcat.ai/cli/pull) \| Size \|
	\|:-----\|:--------------------------------------------------\|-----:\|
	\| [GGUF](https://huggingface.co/giladgd/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b.MXFP4.gguf) \| `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf` \| 12.1GB \|
	\| [GGUF](https://huggingface.co/giladgd/gpt-oss-20b-GGUF/resolve/main/gpt-oss-20b.F16.gguf) \| `hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.F16.gguf` \| 13.8GB \|

	> [!TIP]
	> Download a quant using `node-llama-cpp` ([more info](https://node-llama-cpp.withcat.ai/cli/pull)):
	> ```bash
	> npx -y node-llama-cpp pull <URI>
	> ```


	# Usage
	## Use with [`node-llama-cpp`](https://node-llama-cpp.withcat.ai) (recommended)
	### CLI
	Chat with the model:
	```bash
	npx -y node-llama-cpp chat hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
	```

	> [!NOTE]
	> Ensure that you have `node.js` installed first:
	> ```bash
	> brew install nodejs
	> ```

	### Code
	Use it in your node.js project:
	```bash
	npm install node-llama-cpp
	```

	```typescript
	import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp";

	const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";


	const llama = await getLlama();
	const model = await llama.loadModel({
	modelPath: await resolveModelFile(modelUri)
	});
	const context = await model.createContext();
	const session = new LlamaChatSession({
	contextSequence: context.getSequence()
	});


	const q1 = "Hi there, how are you?";
	console.log("User: " + q1);

	const a1 = await session.prompt(q1);
	console.log("AI: " + a1);
	```

	> [!TIP]
	> Read the [getting started guide](https://node-llama-cpp.withcat.ai/guide/) to quickly scaffold a new `node-llama-cpp` project

	#### Customize inference options
	Set [Harmoy](https://cookbook.openai.com/articles/openai-harmony) options using [`HarmonyChatWrapper`](https://node-llama-cpp.withcat.ai/api/classes/HarmonyChatWrapper):
	```typescript
	import {
	getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper,
	defineChatSessionFunction
	} from "node-llama-cpp";

	const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";


	const llama = await getLlama();
	const model = await llama.loadModel({
	modelPath: await resolveModelFile(modelUri)
	});
	const context = await model.createContext();
	const session = new LlamaChatSession({
	contextSequence: context.getSequence(),
	chatWrapper: new HarmonyChatWrapper({
	modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
	reasoningEffort: "high"
	})
	});

	const functions = {
	getCurrentWeather: defineChatSessionFunction({
	description: "Gets the current weather in the provided location.",
	params: {
	type: "object",
	properties: {
	location: {
	type: "string",
	description: "The city and state, e.g. San Francisco, CA"
	},
	format: {
	enum: ["celsius", "fahrenheit"]
	}
	}
	},
	handler({location, format}) {
	console.log(`Getting current weather for "${location}" in ${format}`);

	return {
	// simulate a weather API response
	temperature: format === "celsius" ? 20 : 68,
	format
	};
	}
	})
	};

	const q1 = "What is the weather like in SF?";
	console.log("User: " + q1);

	const a1 = await session.prompt(q1, {functions});
	console.log("AI: " + a1);
	```


	## Use with [llama.cpp](https://github.com/ggml-org/llama.cpp)
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp
	```

	### CLI
	```bash
	llama-cli --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -p "The meaning to life and the universe is"
	```

	### Server
	```bash
	llama-server --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -c 2048
	```