Model Card for Qwen3-VL-8B-german-shorthand
This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct for transcribing medieval German shorthand from images. It has been trained using TRL on the wjbmattingly/german-shorthand-window-5-4 dataset.
Model Description
This vision-language model specializes in transcribing text from images of German shorthand documents. Given an image of shorthand text, the model generates the corresponding transcription.
Quick start
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
# Load model and processor
base_model = "Qwen/Qwen3-VL-8B-Instruct"
adapter_model = "wjbmattingly/Qwen3-VL-8B-german-shorthand"
model = Qwen3VLForConditionalGeneration.from_pretrained(
base_model,
torch_dtype="auto",
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_model)
processor = AutoProcessor.from_pretrained(base_model)
# Load your image
image = Image.open("path/to/your/shorthand_image.jpg")
# Prepare the message
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Transcribe the text shown in this image."},
],
},
]
# Generate transcription
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
transcription = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(transcription)
Training procedure
This model was fine-tuned using Supervised Fine-Tuning (SFT) with LoRA adapters on the Qwen3-VL-8B-Instruct base model.
Training Data
The model was trained on wjbmattingly/german-shorthand-window-5-4, a dataset containing images of German shorthand with corresponding text transcriptions.
Training Configuration
- Base Model: Qwen/Qwen3-VL-8B-Instruct
- Training Method: Supervised Fine-Tuning (SFT) with LoRA
- LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.1
- Training Arguments:
- Epochs: 3
- Batch size per device: 2
- Gradient accumulation steps: 4
- Learning rate: 5e-5
- Optimizer: AdamW
- Mixed precision: FP16
Framework versions
- TRL: 0.23.0
- Transformers: 4.57.1
- Pytorch: 2.8.0
- Datasets: 4.1.1
- Tokenizers: 0.22.1
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for wjbmattingly/Qwen3-VL-8B-german-shorthand
Base model
Qwen/Qwen3-VL-8B-Instruct