Manojb/Qwen3-4B-toolcalling-gguf-codex · User eperiences thread

User eperiences thread

by BingoBird - opened Sep 24

Discussion

BingoBird

Sep 24

This appears to be an important model. Please share your experiences using it here.

Manojb

Owner Sep 25

I would suggest to use llamacpp, it doesnt have memory constraints. You can se the VRAM you'd like to use and offload otherwise to cpu. Good for RAG, and faster than ollama without any optimizations like vLLM
https://huggingface.co/Manojb/Qwen3-4b-toolcall-gguf-llamacpp-codex

jashankhangura10

Sep 27

can this model reply in natural language or only in tool calling json format?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment