--- title: KYB Dots.OCR Text Extraction emoji: 🖨️ colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false license: "other" --- # KYB Dots.OCR Text Extraction This [Hugging Face Space](https://huggingface.co/docs/hub/spaces) provides a FastAPI endpoint for text extraction from identity documents using Dots.OCR with ROI (Region of Interest) support. Built as a Docker Space for maximum flexibility and performance. ## 🚀 Quick Start ### Using the API 1. **Upload an image** (JPEG, PNG, or other supported formats) 2. **Optionally specify ROI** coordinates for targeted extraction 3. **Get structured results** with confidence scores and field mapping ### Test the API ```bash # Basic OCR test curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \ -F "file=@test_image.jpg" # With ROI (region of interest) curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \ -F "file=@test_image.jpg" \ -F 'roi={"x1":0.1,"y1":0.1,"x2":0.9,"y2":0.9}' ``` ## ✨ Features - **🔍 Text Extraction**: Extract text from identity documents using Dots.OCR - **📐 ROI Support**: Process pre-cropped images or full images with ROI coordinates - **📋 Field Mapping**: Structured field extraction with confidence scores - **🆔 MRZ Detection**: Machine Readable Zone data extraction - **🔌 Standardized API**: Consistent response format for integration - **🐳 Docker-based**: Full control over dependencies and environment - **⚡ GPU Support**: Optimized for Hugging Face Spaces GPU instances ## 📡 API Endpoints ### Health Check ```http GET /health ``` Returns service status and version information. ### Text Extraction ```http POST /v1/id/ocr Content-Type: multipart/form-data file: roi: {"x1": 0.0, "y1": 0.0, "x2": 1.0, "y2": 1.0} (optional) ``` **Parameters:** - `file`: Image file to process (required) - `roi`: JSON string with normalized coordinates (optional) - `x1`, `y1`: Top-left corner (0.0 to 1.0) - `x2`, `y2`: Bottom-right corner (0.0 to 1.0) ## 📄 Response Format ```json { "request_id": "uuid", "media_type": "image", "processing_time": 0.456, "detections": [ { "mrz_data": { "document_type": "TD3", "issuing_country": "NLD", "surname": "MULDER", "given_names": "THOMAS", "document_number": "NLD123456789", "nationality": "NLD", "date_of_birth": "1990-01-01", "gender": "M", "date_of_expiry": "2030-01-01", "personal_number": "123456789", "raw_mrz": "P