User eperiences thread

#1
by BingoBird - opened

This appears to be an important model. Please share your experiences using it here.

Owner

I would suggest to use llamacpp, it doesnt have memory constraints. You can se the VRAM you'd like to use and offload otherwise to cpu. Good for RAG, and faster than ollama without any optimizations like vLLM
https://huggingface.co/Manojb/Qwen3-4b-toolcall-gguf-llamacpp-codex

can this model reply in natural language or only in tool calling json format?

Sign up or log in to comment