Spaces:

halley-ai
/

README

No application file

App Files Files Community

README / README.md

sebastavar

Update README.md

6394d82 verified 2 months ago

preview code

raw

history blame contribute delete

2.12 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: README
emoji: 📈
colorFrom: purple
colorTo: pink
sdk: gradio
pinned: false
sdk_version: 5.46.0

Halley AI on Hugging Face

High-quality, Apple-Silicon–optimized MLX builds, tools, and evals — focused on practical, on-prem inference for small teams.

We publish Mixture-of-Experts (MoE) models and MLX quantizations tuned for M-series Macs (Metal + unified memory).
Target use: fast, reliable interactive chat and light batch workloads.

🚀 Featured models

gpt-oss-20b (MLX)

Repo	Bits/GS	Footprint	Notes
halley-ai/gpt-oss-20b-MLX-5bit-gs32	Q5 / 32	~15.8 GB	Small drop vs 6-bit (~3–6% PPL); “fits‑24GB” unified memory.
halley-ai/gpt-oss-20b-MLX-6bit-gs32	Q6 / 32	~18.4 GB	Best of the group; strong quality/footprint tradeoff.

gpt-oss-120b (MLX)

Repo	Bits/GS	Memory	Notes
halley-ai/gpt-oss-120b-MLX-8bit-gs32	Q8 / 32	~63.42 GB	Reference int8; stable and simple to use.
halley-ai/gpt-oss-120b-MLX-bf16	bf16	~65.28 GB	Non-quantized reference for evaluation/ground truth.

Qwen3-Next-80B-A3B-Instruct (MLX)

Repo	Bits/GS	Footprint	Notes
halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-6bit-gs64	Q6 / 64	~64.92 GB	Quality pick; matched bf16 on our PPL run (5.14).
halley-ai/Qwen3-Next-80B-A3B-Instruct-MLX-5bit-gs32	Q5 / 32	~59.86 GB	Balanced; near‑par PPL (5.20) and strong deterministic math.

Perplexity reported with our fast preset on WikiText‑2 (raw, test). See repository docs for exact commands.

Format: MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.