User eperiences thread
#1
by
BingoBird
- opened
This appears to be an important model. Please share your experiences using it here.
I would suggest to use llamacpp, it doesnt have memory constraints. You can se the VRAM you'd like to use and offload otherwise to cpu. Good for RAG, and faster than ollama without any optimizations like vLLMhttps://huggingface.co/Manojb/Qwen3-4b-toolcall-gguf-llamacpp-codex
can this model reply in natural language or only in tool calling json format?