Spaces:
No application file
No application file
| title: README | |
| emoji: 📈 | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| pinned: false | |
| sdk_version: 5.46.0 | |
| # Halley AI on Hugging Face | |
| High-quality, Apple-Silicon–optimized **MLX** builds, tools, and evals — focused on practical, on-prem inference for small teams. | |
| > We publish **Mixture-of-Experts (MoE)** models and MLX quantizations tuned for M-series Macs (Metal + unified memory). | |
| > Target use: fast, reliable **interactive chat** and light batch workloads. | |
| --- | |
| ## 🚀 Featured models | |
| ### gpt-oss-20b (MLX) | |
| | Repo | Bits/GS | Footprint | Notes | | |
| |---|---:|---:|---| | |
| | [halley-ai/gpt-oss-20b-MLX-5bit-gs32](https://huggingface.co/halley-ai/gpt-oss-20b-MLX-5bit-gs32) | Q5 / 32 | ~15.8 GB | Small drop vs 6-bit (~3–6% PPL); “fits‑24GB” unified memory. | | |
| | [halley-ai/gpt-oss-20b-MLX-6bit-gs32](https://huggingface.co/halley-ai/gpt-oss-20b-MLX-6bit-gs32) | Q6 / 32 | ~18.4 GB | Best of the group; strong quality/footprint tradeoff. | | |
| ### gpt-oss-120b (MLX) | |
| | Repo | Bits/GS | Memory | Notes | | |
| |---|---:|---|---| | |
| | [halley-ai/gpt-oss-120b-MLX-8bit-gs32](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-8bit-gs32) | Q8 / 32 | ~63.42 GB | Reference int8; stable and simple to use. | | |
| | [halley-ai/gpt-oss-120b-MLX-bf16](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-bf16) | bf16 | ~65.28 GB | Non-quantized reference for evaluation/ground truth. | | |
| ### Qwen3-Next-80B-A3B-Instruct (MLX) | |
| | Repo | Bits/GS | Footprint | Notes | | |
| |---|---:|---:|---| | |
| | [halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-6bit-gs64](https://huggingface.co/halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-6bit-gs64) | Q6 / 64 | ~64.92 GB | Quality pick; matched bf16 on our PPL run (5.14). | | |
| | [halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-5bit-gs32](https://huggingface.co/halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-5bit-gs32) | Q5 / 32 | ~59.86 GB | Balanced; near‑par PPL (5.20) and strong deterministic math. | | |
| Perplexity reported with our fast preset on WikiText‑2 (raw, test). See repository docs for exact commands. | |
| **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp. |