llama3b-attribute-inference-q4_k_m

Model Summary

A ~3B parameter Llama 3.x instruction model, further fine-tuned with Unsloth using QLoRA (4-bit adapters) to infer personal attributes from first-person text and output a compact JSON report. The model predicts keys like "age", "occupation", "income_level", "city_country", etc., and for each one gives:

estimate: inferred value
confidence: integer 1–5

If the model cannot infer an attribute with any justification, that attribute is simply omitted from the JSON. :contentReference[oaicite:0]{index=0}

The final checkpoint is merged and exported to GGUF with q4_k_m quantization for CPU-friendly local inference via llama.cpp / node-llama-cpp. :contentReference[oaicite:1]{index=1}

Intended Use

This model is intended for research on privacy and attribute inference: given informal self-descriptive text, estimate likely traits (age, relationship status, education level, etc.) and produce machine-readable output.

This model is not intended for profiling, scoring, surveillance, hiring decisions, or any automated judgment about real people. Predictions are guesses and can be biased or wrong. :contentReference[oaicite:2]{index=2}

Training Data

The model was fine-tuned on a reformatted version of the RobinSta/SynthPAI dataset, which consists of synthetic first-person narratives plus human-reviewed annotations of personal attributes (age, education, relationship status, income band, etc.). The script loads the dataset and performs an 80/20 train/validation split. :contentReference[oaicite:3]{index=3} :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5}

Each data point is turned into a chat-style triple:

system: instructions defining which attributes to infer and the required JSON schema
user: the narrative text
assistant: the target JSON (ground truth attributes + confidence)

Only the assistant JSON is used for loss (the trainer masks prompts so the model is optimized to produce just the final JSON answer). :contentReference[oaicite:6]{index=6} :contentReference[oaicite:7]{index=7}

Training Procedure

Base model
unsloth/Llama-3.2-3B-Instruct-bnb-4bit (4-bit loaded). The script also supports an 8B Llama 3.1 variant, but this release uses the ~3B class for smaller memory footprint. :contentReference[oaicite:8]{index=8}

Method
QLoRA / PEFT via Unsloth:

LoRA r = 16
lora_alpha = 16
lora_dropout = 0
target modules include attention and MLP projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
gradient checkpointing = "unsloth"
load_in_4bit = True
max_seq_length = 4096 tokens (RoPE scaling handled by Unsloth) :contentReference[oaicite:9]{index=9} :contentReference[oaicite:10]{index=10}

Trainer config (SFTTrainer)

effective batch size ≈ 8 via per_device_train_batch_size=2 and gradient_accumulation_steps=4
max_steps = 200
learning_rate = 1e-4
warmup_steps = 5
optimizer = adamw_8bit
weight_decay = 0.01
cosine LR schedule
eval every 50 steps on the held-out split
bf16/fp16 selected based on hardware support
packing disabled (no sequence packing) :contentReference[oaicite:11]{index=11} :contentReference[oaicite:12]{index=12}

After training, the LoRA adapters were merged into the base weights and exported as a single GGUF (q4_k_m) checkpoint for llama.cpp-compatible inference. :contentReference[oaicite:13]{index=13}

Output Format

The model is optimized to answer only in strict JSON. Example:

{
  "age": {"estimate": 34, "confidence": 2},
  "occupation": {"estimate": "software engineer", "confidence": 1},
  "city_country": {"estimate": "San Francisco, USA", "confidence": 4}
}

Downloads last month: 5

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for gufett0/unsloth-llama3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

unsloth/Llama-3.2-3B-Instruct-bnb-4bit

Adapter

(28)

this model

gufett0
/

unsloth-llama3B