π€ quem-4b v2
A 4-billion parameter merged language model built on the Qwen3 family. quem-v2-4b blends four complementary models using LazyMergekit with the DARE-TIES method to deliver a compact, versatile model for instruction following, coding assistance, and reasoning.
π Overview
quem-v2-4b is a carefully balanced merge of four specialized 4B-class models. Using DARE-TIES with equal weights, it aims to retain strengths across general conversation (Jan), fast responses (Lightning), mathematical reasoning (Hebrew Math Tutor), and code reasoning (Qwen3 Code Reasoning).
β¨ Key Features
- Balanced Merge: Equal weights (25% each) for stability across skills.
 - Reasoning & Code: Improved chain-of-thought style reasoning and code understanding from contributor models.
 - Compact & Efficient: 4B parameters for fast inference on a single consumer GPU.
 - Instruction-Tuned: Works out-of-the-box with standard chat prompts via the HF chat template.
 
π§ Base Models
- janhq/Jan-v1-2509
 - quelmap/Lightning-4b
 - Intel/hebrew-math-tutor-v1
 - GetSoloTech/Qwen3-Code-Reasoning-4B
 
All contributions are merged on top of a Qwen3 base (see configuration below).
π οΈ Merge Method & Configuration
The merge was performed using LazyMergekit, ensuring a harmonious integration of the different specializations.
Merge YAML (LazyMergekit)
models:
  - model: janhq/Jan-v1-2509
    parameters:
      density: 0.6
      weight: 0.25
  - model: quelmap/Lightning-4b
    parameters:
      density: 0.6
      weight: 0.25
  - model: Intel/hebrew-math-tutor-v1
    parameters:
      density: 0.6
      weight: 0.25
  - model: GetSoloTech/Qwen3-Code-Reasoning-4B
    parameters:
      density: 0.6
      weight: 0.25
merge_method: dare_ties
base_model: unsloth/Qwen3-4B-Thinking-2507
parameters:
  normalize: true
  int8_mask: false
device: auto
dtype: bfloat16
π» Usage (Transformers)
Install:
pip install -U transformers accelerate torch
Minimal chat example:
from transformers import AutoTokenizer, pipeline
import torch
model_id = "rodrigomt/quem-v2-4b"
messages = [
    {"role": "user", "content": "What is a large language model?"}
]
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
out = pipe(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
)
print(out[0]["generated_text"])
Prompting Tips
- Use standard system / user / assistant chat structure.
 - For coding tasks, include concise requirements, desired language, and constraints.
 - For math/logic tasks, allow slightly higher 
max_new_tokensand consider lower temperature (e.g.,temperature=0.3β0.5) for more deterministic reasoning. 
βοΈ Inference Notes
Precision: Default
bfloat16(bf16);float16also works well on most GPUs.Quantization: 4-bit/8-bit quantization via
bitsandbytesorauto-gptqcan reduce memory; expect some quality trade-offs.Decoding:
- General chat: 
temperature=0.7,top_p=0.9β0.95,max_new_tokens=256. - Code/Math: lower temperature (
0.2β0.5), optionally increasemax_new_tokensto 512β1024 for step-by-step reasoning. 
- General chat: 
 
π§ͺ Evaluation
No unified public benchmark is included in this release. Early local testing indicates improved step-by-step reasoning compared to the prior 4B merge on similar hardware, but results are highly sensitive to decoding parameters and prompts. Community PRs with reproducible evals (Arena/AlpacaEval/HELM/OpenLLM Leaderboards/LocalAIMe) are welcome.
π₯οΈ System Requirements
Minimum (single GPU):
- RAM: 16 GB
 - VRAM: 8 GB (e.g., RTX 3060 Ti / 3070 class)
 - Storage: ~20 GB free
 - CPU: Recent quad-core
 
Recommended:
- RAM: 32 GB
 - VRAM: 12 GB+ (e.g., RTX 4070 / 3080 or higher)
 - CPU: Modern multi-core
 
Quantized weights can reduce VRAM but may affect quality.
π Acknowledgments
Thanks to the authors and communities behind Jan, Lightning, Intel Hebrew Math Tutor, Qwen3 Code Reasoning, and the LazyMergekit toolchain.
π License
This model is licensed under the Apache 2.0 License.
- Downloads last month
 - 4