Qwen3-Embedding-0.6B (GGUF) Models
This directory contains GGUF builds of the Qwen3 0.6B embedding model, produced from the upstream base repository Qwen/Qwen3-0.6B-Base (original Hugging Face layout in ../Qwen3-Embedding-0.6B/).
Contents
| File | Purpose |
|---|---|
qwen3-embedding-0.6b.Q4_K_M.gguf |
Quantized (Q4_K_M) GGUF for efficient inference. |
qwen3-embedding-0.6b-fix.gguf |
Same model with explicit sep_token / EOS metadata fix applied. |
Special Token Configuration
Extracted from tokenizer_config.json:
"sep_token": "<|endoftext|>",
"sep_token_id": 151643
The model uses <|endoftext|> as both the padding (pad_token) and separator (sep_token). For embedding generation each input text MUST terminate with the separator token (or the converter must auto-append it) to avoid a runtime warning:
[WARNING] At least one last token in strings embedded is not SEP. 'tokenizer.ggml.add_eos_token' should be set to 'true' in the GGUF header
Why the Warning Appears
If the GGUF metadata key tokenizer.ggml.add_eos_token is absent or false, llama.cpp will not auto-append the final SEP/EOS token for embedding inputs. Any input string that does not already end with <|endoftext|> triggers the warning and may yield sub‑optimal embeddings (slightly different token boundary semantics).
Fix Implemented
The file qwen3-embedding-0.6b-fix.gguf was regenerated ensuring:
tokenizer.ggml.add_eos_token = truesep_token(<|endoftext|>) retained with id151643
This makes llama.cpp automatically append the SEP/EOS token when missing, silencing the warning and standardizing embeddings.
Rebuilding From Upstream (Recommended Process)
- Obtain upstream model:
- Clone or download
Qwen/Qwen3-0.6B-Base(embedding variant directory).
- Clone or download
- Convert to GGUF using the current
llama.cppconversion script:- Use the repo's
convert_hf_to_gguf.py(it already sets EOS for Qwen tokenizers). Example:
- Use the repo's
python3 llama.cpp/convert_hf_to_gguf.py \
--model Qwen3-Embedding-0.6B \
--outfile qwen3-embedding-0.6b-fix.gguf \
--ftype q4_k_m
If you previously produced a GGUF that shows the warning, just re-run conversion with an up-to-date
llama.cppcheckout. The script internally writestokenizer.ggml.add_eos_token = truefor this tokenizer family.
Post-Conversion Validation
Run a quick embedding call and confirm no warning appears:
./llama.cpp/build/bin/embedding \
-m models/qwen3-embedding-0.6b-fix.gguf \
-p "Hello world"
If you still see the warning:
- Confirm the binary was rebuilt after updating sources (
makeorcmake --build). - Inspect metadata using a small Python snippet:
from gguf import GGUFReader
r = GGUFReader("models/qwen3-embedding-0.6b-fix.gguf")
for f in r.fields:
if f.name == "tokenizer.ggml.add_eos_token":
print("ADD_EOS_TOKEN=", f.parts[-1])
Expected output: ADD_EOS_TOKEN= True
Manual Patch (Fallback Method)
If re-conversion is inconvenient, you can clone metadata and force the flag:
from gguf import GGUFReader, GGUFWriter, constants as C
src = GGUFReader("qwen3-embedding-0.6b.Q4_K_M.gguf")
dst = GGUFWriter("qwen3-embedding-0.6b-fix.gguf", src.architecture)
# Copy all existing fields except override ADD_EOS
for field in src.fields:
if field.name == C.Keys.Tokenizer.ADD_EOS:
continue
dst.add_field(field.name, field.field_type, field.parts)
dst.add_add_eos_token(True) # set flag
# Copy tensors
for tensor in src.tensors:
data = tensor.data()
dst.add_tensor(tensor.name, data, tensor.shape, tensor.tensor_type)
dst.write_header_to_file()
dst.write_kv_data_to_file()
dst.write_tensors_to_file()
dst.close()
After patching, re-run the validation step.
Usage Notes for Embeddings
- Always feed raw text; no special wrapping needed. Auto-SEP happens with the fixed file.
- For batch embeddings, ensure each string ends cleanly (avoid trailing spaces if you rely on identical hashes downstream).
- The dimensionality matches upstream Qwen3-Embedding-0.6B (refer to upstream docs for exact embedding size).
License & Attribution
The original model weights and tokenizer come from the Qwen project (Qwen/Qwen3-0.6B-Base). Review their license and usage terms before redistribution. This README documents conversion adjustments only (metadata EOS flag addition).
Changelog
- Initial addition: added fixed GGUF with
tokenizer.ggml.add_eos_token = trueto suppress SEP warning.
For further improvements (FP16 build, alternative quantization tiers, or batching examples), open an issue or PR in this repo.
- Downloads last month
- 52
4-bit
Model tree for WilliamSong/qwen3-embedding-0.6b
Base model
Qwen/Qwen3-0.6B-Base