llama3b-attribute-inference-q4_k_m

Model Summary

A ~3B parameter Llama 3.x instruction model, further fine-tuned with Unsloth using QLoRA (4-bit adapters) to infer personal attributes from first-person text and output a compact JSON report. The model predicts keys like "age", "occupation", "income_level", "city_country", etc., and for each one gives:

  • estimate: inferred value
  • confidence: integer 1โ€“5

If the model cannot infer an attribute with any justification, that attribute is simply omitted from the JSON. :contentReference[oaicite:0]{index=0}

The final checkpoint is merged and exported to GGUF with q4_k_m quantization for CPU-friendly local inference via llama.cpp / node-llama-cpp. :contentReference[oaicite:1]{index=1}

Intended Use

This model is intended for research on privacy and attribute inference: given informal self-descriptive text, estimate likely traits (age, relationship status, education level, etc.) and produce machine-readable output.

This model is not intended for profiling, scoring, surveillance, hiring decisions, or any automated judgment about real people. Predictions are guesses and can be biased or wrong. :contentReference[oaicite:2]{index=2}

Training Data

The model was fine-tuned on a reformatted version of the RobinSta/SynthPAI dataset, which consists of synthetic first-person narratives plus human-reviewed annotations of personal attributes (age, education, relationship status, income band, etc.). The script loads the dataset and performs an 80/20 train/validation split. :contentReference[oaicite:3]{index=3} :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5}

Each data point is turned into a chat-style triple:

  1. system: instructions defining which attributes to infer and the required JSON schema
  2. user: the narrative text
  3. assistant: the target JSON (ground truth attributes + confidence)

Only the assistant JSON is used for loss (the trainer masks prompts so the model is optimized to produce just the final JSON answer). :contentReference[oaicite:6]{index=6} :contentReference[oaicite:7]{index=7}

Training Procedure

Base model
unsloth/Llama-3.2-3B-Instruct-bnb-4bit (4-bit loaded). The script also supports an 8B Llama 3.1 variant, but this release uses the ~3B class for smaller memory footprint. :contentReference[oaicite:8]{index=8}

Method
QLoRA / PEFT via Unsloth:

  • LoRA r = 16
  • lora_alpha = 16
  • lora_dropout = 0
  • target modules include attention and MLP projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • gradient checkpointing = "unsloth"
  • load_in_4bit = True
  • max_seq_length = 4096 tokens (RoPE scaling handled by Unsloth) :contentReference[oaicite:9]{index=9} :contentReference[oaicite:10]{index=10}

Trainer config (SFTTrainer)

  • effective batch size โ‰ˆ 8 via per_device_train_batch_size=2 and gradient_accumulation_steps=4
  • max_steps = 200
  • learning_rate = 1e-4
  • warmup_steps = 5
  • optimizer = adamw_8bit
  • weight_decay = 0.01
  • cosine LR schedule
  • eval every 50 steps on the held-out split
  • bf16/fp16 selected based on hardware support
  • packing disabled (no sequence packing) :contentReference[oaicite:11]{index=11} :contentReference[oaicite:12]{index=12}

After training, the LoRA adapters were merged into the base weights and exported as a single GGUF (q4_k_m) checkpoint for llama.cpp-compatible inference. :contentReference[oaicite:13]{index=13}

Output Format

The model is optimized to answer only in strict JSON. Example:

{
  "age": {"estimate": 34, "confidence": 2},
  "occupation": {"estimate": "software engineer", "confidence": 1},
  "city_country": {"estimate": "San Francisco, USA", "confidence": 4}
}
Downloads last month
5
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gufett0/unsloth-llama3B

Dataset used to train gufett0/unsloth-llama3B