πŸ€– quem-4b v2

A 4-billion parameter merged language model built on the Qwen3 family. quem-v2-4b blends four complementary models using LazyMergekit with the DARE-TIES method to deliver a compact, versatile model for instruction following, coding assistance, and reasoning.

πŸ“‹ Overview

quem-v2-4b is a carefully balanced merge of four specialized 4B-class models. Using DARE-TIES with equal weights, it aims to retain strengths across general conversation (Jan), fast responses (Lightning), mathematical reasoning (Hebrew Math Tutor), and code reasoning (Qwen3 Code Reasoning).

✨ Key Features

  • Balanced Merge: Equal weights (25% each) for stability across skills.
  • Reasoning & Code: Improved chain-of-thought style reasoning and code understanding from contributor models.
  • Compact & Efficient: 4B parameters for fast inference on a single consumer GPU.
  • Instruction-Tuned: Works out-of-the-box with standard chat prompts via the HF chat template.

πŸ”§ Base Models

All contributions are merged on top of a Qwen3 base (see configuration below).


πŸ› οΈ Merge Method & Configuration

The merge was performed using LazyMergekit, ensuring a harmonious integration of the different specializations.

Merge YAML (LazyMergekit)

models:
  - model: janhq/Jan-v1-2509
    parameters:
      density: 0.6
      weight: 0.25

  - model: quelmap/Lightning-4b
    parameters:
      density: 0.6
      weight: 0.25

  - model: Intel/hebrew-math-tutor-v1
    parameters:
      density: 0.6
      weight: 0.25

  - model: GetSoloTech/Qwen3-Code-Reasoning-4B
    parameters:
      density: 0.6
      weight: 0.25

merge_method: dare_ties
base_model: unsloth/Qwen3-4B-Thinking-2507

parameters:
  normalize: true
  int8_mask: false

device: auto
dtype: bfloat16

πŸ’» Usage (Transformers)

Install:

pip install -U transformers accelerate torch

Minimal chat example:

from transformers import AutoTokenizer, pipeline
import torch

model_id = "rodrigomt/quem-v2-4b"

messages = [
    {"role": "user", "content": "What is a large language model?"}
]

tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

out = pipe(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
)
print(out[0]["generated_text"])

Prompting Tips

  • Use standard system / user / assistant chat structure.
  • For coding tasks, include concise requirements, desired language, and constraints.
  • For math/logic tasks, allow slightly higher max_new_tokens and consider lower temperature (e.g., temperature=0.3–0.5) for more deterministic reasoning.

βš™οΈ Inference Notes

  • Precision: Default bfloat16 (bf16); float16 also works well on most GPUs.

  • Quantization: 4-bit/8-bit quantization via bitsandbytes or auto-gptq can reduce memory; expect some quality trade-offs.

  • Decoding:

    • General chat: temperature=0.7, top_p=0.9–0.95, max_new_tokens=256.
    • Code/Math: lower temperature (0.2–0.5), optionally increase max_new_tokens to 512–1024 for step-by-step reasoning.

πŸ§ͺ Evaluation

No unified public benchmark is included in this release. Early local testing indicates improved step-by-step reasoning compared to the prior 4B merge on similar hardware, but results are highly sensitive to decoding parameters and prompts. Community PRs with reproducible evals (Arena/AlpacaEval/HELM/OpenLLM Leaderboards/LocalAIMe) are welcome.


πŸ–₯️ System Requirements

Minimum (single GPU):

  • RAM: 16 GB
  • VRAM: 8 GB (e.g., RTX 3060 Ti / 3070 class)
  • Storage: ~20 GB free
  • CPU: Recent quad-core

Recommended:

  • RAM: 32 GB
  • VRAM: 12 GB+ (e.g., RTX 4070 / 3080 or higher)
  • CPU: Modern multi-core

Quantized weights can reduce VRAM but may affect quality.


πŸ™Œ Acknowledgments

Thanks to the authors and communities behind Jan, Lightning, Intel Hebrew Math Tutor, Qwen3 Code Reasoning, and the LazyMergekit toolchain.

πŸ“ License

This model is licensed under the Apache 2.0 License.

Downloads last month
4
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rodrigomt/quem-V2-4b