After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.
💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.
⭐ Star our repo and stay tuned for more exciting updates!
AutoRound v0.7 is out! 🚀 This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input. 👉 Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0