wjbmattingly/old-church-slavonic-dots-ocr · Request for Training Data Examples

Request for Training Data Examples

by ldslcpt - opened Aug 14

Aug 14

Hi! I'm working on fine-tuning the dots.ocr model for document understanding tasks and would like to better understand the expected data format to ensure my implementation is correct.

Could you please provide some sample data to help me understand the correct format? Specifically:

JSONL training data samples
- A few lines from a working training dataset
- This would help me verify my data preparation pipeline
PAGEXML + JPEG pairs
- Sample PAGEXML files with corresponding images
- This would help me understand the annotation structure and coordinate system
Data preparation guidelines
- Any additional best practices for data preparation
- Common pitfalls to avoid

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment