Spaces:

halley-ai
/

README

No application file

App Files Files Community

README / README.md

sebastavar

Update README.md

6394d82 verified 2 months ago

preview code

raw

history blame contribute delete

2.12 kB

	---
	title: README
	emoji: 📈
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	pinned: false
	sdk_version: 5.46.0
	---

	# Halley AI on Hugging Face

	High-quality, Apple-Silicon–optimized MLX builds, tools, and evals — focused on practical, on-prem inference for small teams.
	> We publish Mixture-of-Experts (MoE) models and MLX quantizations tuned for M-series Macs (Metal + unified memory).
	> Target use: fast, reliable interactive chat and light batch workloads.

	---

	## 🚀 Featured models

	### gpt-oss-20b (MLX)

	\| Repo \| Bits/GS \| Footprint \| Notes \|
	\|---\|---:\|---:\|---\|
	\| [halley-ai/gpt-oss-20b-MLX-5bit-gs32](https://huggingface.co/halley-ai/gpt-oss-20b-MLX-5bit-gs32) \| Q5 / 32 \| ~15.8 GB \| Small drop vs 6-bit (~3–6% PPL); “fits‑24GB” unified memory. \|
	\| [halley-ai/gpt-oss-20b-MLX-6bit-gs32](https://huggingface.co/halley-ai/gpt-oss-20b-MLX-6bit-gs32) \| Q6 / 32 \| ~18.4 GB \| Best of the group; strong quality/footprint tradeoff. \|

	### gpt-oss-120b (MLX)

	\| Repo \| Bits/GS \| Memory \| Notes \|
	\|---\|---:\|---\|---\|
	\| [halley-ai/gpt-oss-120b-MLX-8bit-gs32](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-8bit-gs32) \| Q8 / 32 \| ~63.42 GB \| Reference int8; stable and simple to use. \|
	\| [halley-ai/gpt-oss-120b-MLX-bf16](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-bf16) \| bf16 \| ~65.28 GB \| Non-quantized reference for evaluation/ground truth. \|

	### Qwen3-Next-80B-A3B-Instruct (MLX)

	\| Repo \| Bits/GS \| Footprint \| Notes \|
	\|---\|---:\|---:\|---\|
	\| [halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-6bit-gs64](https://huggingface.co/halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-6bit-gs64) \| Q6 / 64 \| ~64.92 GB \| Quality pick; matched bf16 on our PPL run (5.14). \|
	\| [halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-5bit-gs32](https://huggingface.co/halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-5bit-gs32) \| Q5 / 32 \| ~59.86 GB \| Balanced; near‑par PPL (5.20) and strong deterministic math. \|

	Perplexity reported with our fast preset on WikiText‑2 (raw, test). See repository docs for exact commands.

	Format: MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.