🤖 quem-4b v2

A 4-billion parameter merged language model built on the Qwen3 family. quem-v2-4b blends four complementary models using LazyMergekit with the DARE-TIES method to deliver a compact, versatile model for instruction following, coding assistance, and reasoning.

📋 Overview

quem-v2-4b is a carefully balanced merge of four specialized 4B-class models. Using DARE-TIES with equal weights, it aims to retain strengths across general conversation (Jan), fast responses (Lightning), mathematical reasoning (Hebrew Math Tutor), and code reasoning (Qwen3 Code Reasoning).

✨ Key Features

Balanced Merge: Equal weights (25% each) for stability across skills.
Reasoning & Code: Improved chain-of-thought style reasoning and code understanding from contributor models.
Compact & Efficient: 4B parameters for fast inference on a single consumer GPU.
Instruction-Tuned: Works out-of-the-box with standard chat prompts via the HF chat template.

🔧 Base Models

All contributions are merged on top of a Qwen3 base (see configuration below).

🛠️ Merge Method & Configuration

The merge was performed using LazyMergekit, ensuring a harmonious integration of the different specializations.

Merge YAML (LazyMergekit)

models:
  - model: janhq/Jan-v1-2509
    parameters:
      density: 0.6
      weight: 0.25

  - model: quelmap/Lightning-4b
    parameters:
      density: 0.6
      weight: 0.25

  - model: Intel/hebrew-math-tutor-v1
    parameters:
      density: 0.6
      weight: 0.25

  - model: GetSoloTech/Qwen3-Code-Reasoning-4B
    parameters:
      density: 0.6
      weight: 0.25

merge_method: dare_ties
base_model: unsloth/Qwen3-4B-Thinking-2507

parameters:
  normalize: true
  int8_mask: false

device: auto
dtype: bfloat16

💻 Usage (Transformers)

Install:

pip install -U transformers accelerate torch

Minimal chat example:

from transformers import AutoTokenizer, pipeline
import torch

model_id = "rodrigomt/quem-v2-4b"

messages = [
    {"role": "user", "content": "What is a large language model?"}
]

tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

out = pipe(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
)
print(out[0]["generated_text"])

Prompting Tips

Use standard system / user / assistant chat structure.
For coding tasks, include concise requirements, desired language, and constraints.
For math/logic tasks, allow slightly higher max_new_tokens and consider lower temperature (e.g., temperature=0.3–0.5) for more deterministic reasoning.

⚙️ Inference Notes

Precision: Default bfloat16 (bf16); float16 also works well on most GPUs.
Quantization: 4-bit/8-bit quantization via bitsandbytes or auto-gptq can reduce memory; expect some quality trade-offs.
Decoding:
- General chat: temperature=0.7, top_p=0.9–0.95, max_new_tokens=256.
- Code/Math: lower temperature (0.2–0.5), optionally increase max_new_tokens to 512–1024 for step-by-step reasoning.

🧪 Evaluation

No unified public benchmark is included in this release. Early local testing indicates improved step-by-step reasoning compared to the prior 4B merge on similar hardware, but results are highly sensitive to decoding parameters and prompts. Community PRs with reproducible evals (Arena/AlpacaEval/HELM/OpenLLM Leaderboards/LocalAIMe) are welcome.

🖥️ System Requirements

Minimum (single GPU):

RAM: 16 GB
VRAM: 8 GB (e.g., RTX 3060 Ti / 3070 class)
Storage: ~20 GB free
CPU: Recent quad-core

Recommended:

RAM: 32 GB
VRAM: 12 GB+ (e.g., RTX 4070 / 3080 or higher)
CPU: Modern multi-core

Quantized weights can reduce VRAM but may affect quality.

🙌 Acknowledgments

Thanks to the authors and communities behind Jan, Lightning, Intel Hebrew Math Tutor, Qwen3 Code Reasoning, and the LazyMergekit toolchain.

📝 License

This model is licensed under the Apache 2.0 License.

Downloads last month: 4

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for rodrigomt/quem-V2-4b

GetSoloTech/Qwen3-Code-Reasoning-4B

Intel/hebrew-math-tutor-v1

janhq/Jan-v1-2509

quelmap/Lightning-4b

Merge model

this model

Quantizations

1 model