llama3b-attribute-inference-q4_k_m
Model Summary
A ~3B parameter Llama 3.x instruction model, further fine-tuned with Unsloth using QLoRA (4-bit adapters) to infer personal attributes from first-person text and output a compact JSON report. The model predicts keys like "age", "occupation", "income_level", "city_country", etc., and for each one gives:
estimate: inferred valueconfidence: integer 1โ5
If the model cannot infer an attribute with any justification, that attribute is simply omitted from the JSON. :contentReference[oaicite:0]{index=0}
The final checkpoint is merged and exported to GGUF with q4_k_m quantization for CPU-friendly local inference via llama.cpp / node-llama-cpp. :contentReference[oaicite:1]{index=1}
Intended Use
This model is intended for research on privacy and attribute inference: given informal self-descriptive text, estimate likely traits (age, relationship status, education level, etc.) and produce machine-readable output.
This model is not intended for profiling, scoring, surveillance, hiring decisions, or any automated judgment about real people. Predictions are guesses and can be biased or wrong. :contentReference[oaicite:2]{index=2}
Training Data
The model was fine-tuned on a reformatted version of the RobinSta/SynthPAI dataset, which consists of synthetic first-person narratives plus human-reviewed annotations of personal attributes (age, education, relationship status, income band, etc.). The script loads the dataset and performs an 80/20 train/validation split. :contentReference[oaicite:3]{index=3} :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5}
Each data point is turned into a chat-style triple:
- system: instructions defining which attributes to infer and the required JSON schema
- user: the narrative text
- assistant: the target JSON (ground truth attributes + confidence)
Only the assistant JSON is used for loss (the trainer masks prompts so the model is optimized to produce just the final JSON answer). :contentReference[oaicite:6]{index=6} :contentReference[oaicite:7]{index=7}
Training Procedure
Base modelunsloth/Llama-3.2-3B-Instruct-bnb-4bit (4-bit loaded). The script also supports an 8B Llama 3.1 variant, but this release uses the ~3B class for smaller memory footprint. :contentReference[oaicite:8]{index=8}
Method
QLoRA / PEFT via Unsloth:
- LoRA r = 16
- lora_alpha = 16
- lora_dropout = 0
- target modules include attention and MLP projection layers (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) - gradient checkpointing = "unsloth"
- load_in_4bit = True
- max_seq_length = 4096 tokens (RoPE scaling handled by Unsloth) :contentReference[oaicite:9]{index=9} :contentReference[oaicite:10]{index=10}
Trainer config (SFTTrainer)
- effective batch size โ 8 via
per_device_train_batch_size=2andgradient_accumulation_steps=4 - max_steps = 200
- learning_rate = 1e-4
- warmup_steps = 5
- optimizer =
adamw_8bit - weight_decay = 0.01
- cosine LR schedule
- eval every 50 steps on the held-out split
- bf16/fp16 selected based on hardware support
- packing disabled (no sequence packing) :contentReference[oaicite:11]{index=11} :contentReference[oaicite:12]{index=12}
After training, the LoRA adapters were merged into the base weights and exported as a single GGUF (q4_k_m) checkpoint for llama.cpp-compatible inference. :contentReference[oaicite:13]{index=13}
Output Format
The model is optimized to answer only in strict JSON. Example:
{
"age": {"estimate": 34, "confidence": 2},
"occupation": {"estimate": "software engineer", "confidence": 1},
"city_country": {"estimate": "San Francisco, USA", "confidence": 4}
}
- Downloads last month
- 5
4-bit
Model tree for gufett0/unsloth-llama3B
Base model
meta-llama/Llama-3.2-3B-Instruct