Spaces:
Paused
Paused
File size: 2,553 Bytes
e300623 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# Dots.OCR Service - Hugging Face Spaces Deployment Guide
## ✅ Ready for Deployment
The dots-ocr service is now fully self-contained and ready for deployment to Hugging Face Spaces.
## Files Updated
- **`app.py`** - Fixed import paths to be self-contained
- **`models.py`** - Created local data structures (ExtractedField, IdCardFields, MRZData)
- **`field_extraction.py`** - Created local field extraction module
- **`Dockerfile`** - Updated for HF compliance with proper user permissions
- **`README.md`** - Updated with proper HF Spaces configuration
## Deployment Steps
### 1. Create Hugging Face Space
```bash
# Login to Hugging Face
huggingface-cli login
# Create a new Space
huggingface-cli repo create dots-ocr-idcard --type space --space_sdk docker --organization algoryn
```
### 2. Deploy to HF Space
```bash
# Clone the space locally
git clone https://huggingface.co/spaces/algoryn/dots-ocr-idcard
cd dots-ocr-idcard
# Copy all files from this repository
cp /Users/tmulder/Sources/Algoryn/kybtech-dots-ocr/* .
# Commit and push
git add .
git commit -m "Deploy Dots.OCR text extraction service"
git push
```
### 3. Test the Deployment
Once deployed (usually takes 5-10 minutes), test with:
```bash
# Basic OCR test
curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \
-H "Authorization: Bearer YOUR_HF_TOKEN" \
-F "file=@test_image.jpg"
# With ROI (region of interest)
curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \
-H "Authorization: Bearer YOUR_HF_TOKEN" \
-F "file=@test_image.jpg" \
-F 'roi={"x1":0.1,"y1":0.1,"x2":0.9,"y2":0.9}'
```
## Features
- **Self-contained**: No external dependencies on parent repository
- **HF Compliant**: Follows Hugging Face Docker Spaces best practices
- **Mock Mode**: Falls back to mock implementation if Dots.OCR fails to load
- **ROI Support**: Process pre-cropped images or full images with ROI coordinates
- **Field Extraction**: Structured field extraction with confidence scores
- **MRZ Detection**: Machine Readable Zone data extraction
## API Endpoints
- `GET /health` - Health check
- `POST /v1/id/ocr` - Text extraction with optional ROI
## Environment Variables
No special environment variables needed. The service runs on port 7860 by default.
## Performance
- **GPU**: 300-900ms processing time
- **CPU**: 3-8s processing time
- **Memory**: ~6GB per instance
## Privacy
This endpoint processes images temporarily and does not store or log personal information. All field values are redacted in logs for privacy protection.
|