DIMI Arabic OCR

Accurate Arabic OCR model for extracting printed Arabic text from images


🧠 Overview

DIMI-Arabic-OCR is a fine-tuned vision-language model (VLM) specialized for Arabic Optical Character Recognition (OCR).
It extracts printed Arabic text from images with high accuracy — including diacritics (tashkeel) and punctuation.

  • 🔤 Language: Arabic
  • 🧩 Base Model: Qwen2.5-VL-7B (via Unsloth 4-bit)
  • ⚙️ Task: Image-to-Text / OCR
  • 🪶 Quantization: 4-bit LoRA for efficient inference
  • 👨‍💻 Author: Ahmed Zaky

🚀 Quick Start

# IMPORTANT: Import unsloth first!
import unsloth
from unsloth import FastVisionModel
from PIL import Image
import torch

# Load the model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR",
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

FastVisionModel.for_inference(model)

# Prepare your image
image = Image.open("/content/2.jpg")

# Arabic instruction
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية جدًا، مع الحفاظ الكامل على الترتيب الأصلي والتنسيق."

# Prepare messages
messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},  # Include image here
        {"type": "text", "text": instruction}
    ]}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
)

# Tokenize with proper parameters to avoid truncation
inputs = tokenizer(
    text=input_text,
    images=image,  
    return_tensors="pt",
    padding=True,
    truncation=False, 
    max_length=None,   
).to("cuda")

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False,
        temperature=None,
        top_p=None,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# Decode the prediction
generated_ids = outputs[0][inputs['input_ids'].shape[1]:]
prediction = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

print("Extracted Arabic Text:")
print(prediction)

🧩 Model Architecture

  • Base: Qwen2.5-VL-7B-Instruct
  • Fine-tuning: LoRA (rank 16)
  • Quantization: 4-bit (bnb)
  • Framework: Unsloth for efficient training/inference

📊 Evaluation

Metric Description Score (↓ better)
CER Character Error Rate 0.22
WER Word Error Rate 0.40

Evaluation performed on a 2.6K image test set from combined Arabic OCR datasets (news + diacritics).


🧾 Training Data

Fine-tuned on 26,000 Arabic text images combining:

  1. oddadmix/qari-0.2.2-news-dataset-large
  2. oddadmix/qari-0.2.2-diacritics-dataset-large

The dataset covers modern standard Arabic with and without diacritics.


📚 Citation

If you use this model, please cite:

@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}

🔗 Related Projects


Built with ❤️ by Ahmed Zaky

Advancing Arabic NLP through state-of-the-art embedding models

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AhmedZaky1/DIMI-Arabic-OCR

Adapter
(2)
this model
Adapters
1 model

Datasets used to train AhmedZaky1/DIMI-Arabic-OCR

Evaluation results

  • Word Error Rate on Combined Arabic OCR Dataset
    self-reported
    0.0XXX
  • Character Error Rate on Combined Arabic OCR Dataset
    self-reported
    0.0XXX