Model Card for Qwen3-VL-8B-german-shorthand

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct for transcribing medieval German shorthand from images. It has been trained using TRL on the wjbmattingly/german-shorthand-window-5-4 dataset.

Model Description

This vision-language model specializes in transcribing text from images of German shorthand documents. Given an image of shorthand text, the model generates the corresponding transcription.

Quick start

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
from peft import PeftModel
from PIL import Image

# Load model and processor
base_model = "Qwen/Qwen3-VL-8B-Instruct"
adapter_model = "wjbmattingly/Qwen3-VL-8B-german-shorthand"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    base_model,
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_model)
processor = AutoProcessor.from_pretrained(base_model)

# Load your image
image = Image.open("path/to/your/shorthand_image.jpg")

# Prepare the message
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Transcribe the text shown in this image."},
        ],
    },
]

# Generate transcription
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
transcription = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]

print(transcription)

Training procedure

This model was fine-tuned using Supervised Fine-Tuning (SFT) with LoRA adapters on the Qwen3-VL-8B-Instruct base model.

Training Data

The model was trained on wjbmattingly/german-shorthand-window-5-4, a dataset containing images of German shorthand with corresponding text transcriptions.

Training Configuration

Base Model: Qwen/Qwen3-VL-8B-Instruct
Training Method: Supervised Fine-Tuning (SFT) with LoRA
LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.1
Training Arguments:
- Epochs: 3
- Batch size per device: 2
- Gradient accumulation steps: 4
- Learning rate: 5e-5
- Optimizer: AdamW
- Mixed precision: FP16

Framework versions

TRL: 0.23.0
Transformers: 4.57.1
Pytorch: 2.8.0
Datasets: 4.1.1
Tokenizers: 0.22.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wjbmattingly/Qwen3-VL-8B-german-shorthand

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(21)

this model