Spaces:
Paused
Paused
| # Dots.OCR Service - Hugging Face Spaces Deployment Guide | |
| ## ✅ Ready for Deployment | |
| The dots-ocr service is now fully self-contained and ready for deployment to Hugging Face Spaces. | |
| ## Files Updated | |
| - **`app.py`** - Fixed import paths to be self-contained | |
| - **`models.py`** - Created local data structures (ExtractedField, IdCardFields, MRZData) | |
| - **`field_extraction.py`** - Created local field extraction module | |
| - **`Dockerfile`** - Updated for HF compliance with proper user permissions | |
| - **`README.md`** - Updated with proper HF Spaces configuration | |
| ## Deployment Steps | |
| ### 1. Create Hugging Face Space | |
| ```bash | |
| # Login to Hugging Face | |
| huggingface-cli login | |
| # Create a new Space | |
| huggingface-cli repo create dots-ocr-idcard --type space --space_sdk docker --organization algoryn | |
| ``` | |
| ### 2. Deploy to HF Space | |
| ```bash | |
| # Clone the space locally | |
| git clone https://huggingface.co/spaces/algoryn/dots-ocr-idcard | |
| cd dots-ocr-idcard | |
| # Copy all files from this repository | |
| cp /Users/tmulder/Sources/Algoryn/kybtech-dots-ocr/* . | |
| # Commit and push | |
| git add . | |
| git commit -m "Deploy Dots.OCR text extraction service" | |
| git push | |
| ``` | |
| ### 3. Test the Deployment | |
| Once deployed (usually takes 5-10 minutes), test with: | |
| ```bash | |
| # Basic OCR test | |
| curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \ | |
| -H "Authorization: Bearer YOUR_HF_TOKEN" \ | |
| -F "file=@test_image.jpg" | |
| # With ROI (region of interest) | |
| curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \ | |
| -H "Authorization: Bearer YOUR_HF_TOKEN" \ | |
| -F "file=@test_image.jpg" \ | |
| -F 'roi={"x1":0.1,"y1":0.1,"x2":0.9,"y2":0.9}' | |
| ``` | |
| ## Features | |
| - **Self-contained**: No external dependencies on parent repository | |
| - **HF Compliant**: Follows Hugging Face Docker Spaces best practices | |
| - **Mock Mode**: Falls back to mock implementation if Dots.OCR fails to load | |
| - **ROI Support**: Process pre-cropped images or full images with ROI coordinates | |
| - **Field Extraction**: Structured field extraction with confidence scores | |
| - **MRZ Detection**: Machine Readable Zone data extraction | |
| ## API Endpoints | |
| - `GET /health` - Health check | |
| - `POST /v1/id/ocr` - Text extraction with optional ROI | |
| ## Environment Variables | |
| No special environment variables needed. The service runs on port 7860 by default. | |
| ## Performance | |
| - **GPU**: 300-900ms processing time | |
| - **CPU**: 3-8s processing time | |
| - **Memory**: ~6GB per instance | |
| ## Privacy | |
| This endpoint processes images temporarily and does not store or log personal information. All field values are redacted in logs for privacy protection. | |