---
title: KYB Dots.OCR Text Extraction
emoji: 🖨️
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: "other"
---

# KYB Dots.OCR Text Extraction

This [Hugging Face Space](https://huggingface.co/docs/hub/spaces) provides a FastAPI endpoint for text extraction from identity documents using Dots.OCR with ROI (Region of Interest) support. Built as a Docker Space for maximum flexibility and performance.

## 🚀 Quick Start

### Using the API
1. **Upload an image** (JPEG, PNG, or other supported formats)
2. **Optionally specify ROI** coordinates for targeted extraction
3. **Get structured results** with confidence scores and field mapping

### Test the API
```bash
# Basic OCR test
curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \
  -F "file=@test_image.jpg"

# With ROI (region of interest)
curl -X POST https://algoryn-dots-ocr-idcard.hf.space/v1/id/ocr \
  -F "file=@test_image.jpg" \
  -F 'roi={"x1":0.1,"y1":0.1,"x2":0.9,"y2":0.9}'
```

## ✨ Features

- **🔍 Text Extraction**: Extract text from identity documents using Dots.OCR
- **📐 ROI Support**: Process pre-cropped images or full images with ROI coordinates
- **📋 Field Mapping**: Structured field extraction with confidence scores
- **🆔 MRZ Detection**: Machine Readable Zone data extraction
- **🔌 Standardized API**: Consistent response format for integration
- **🐳 Docker-based**: Full control over dependencies and environment
- **⚡ GPU Support**: Optimized for Hugging Face Spaces GPU instances

## 📡 API Endpoints

### Health Check
```http
GET /health
```
Returns service status and version information.

### Text Extraction
```http
POST /v1/id/ocr
Content-Type: multipart/form-data

file: <image_file>
roi: {"x1": 0.0, "y1": 0.0, "x2": 1.0, "y2": 1.0} (optional)
```

**Parameters:**
- `file`: Image file to process (required)
- `roi`: JSON string with normalized coordinates (optional)
  - `x1`, `y1`: Top-left corner (0.0 to 1.0)
  - `x2`, `y2`: Bottom-right corner (0.0 to 1.0)

## 📄 Response Format

```json
{
  "request_id": "uuid",
  "media_type": "image",
  "processing_time": 0.456,
  "detections": [
    {
      "mrz_data": {
        "document_type": "TD3",
        "issuing_country": "NLD",
        "surname": "MULDER",
        "given_names": "THOMAS",
        "document_number": "NLD123456789",
        "nationality": "NLD",
        "date_of_birth": "1990-01-01",
        "gender": "M",
        "date_of_expiry": "2030-01-01",
        "personal_number": "123456789",
        "raw_mrz": "P<NLDMULDER<<THOMAS<<<<<<<<<<<<<<<<<<<<<<<<<",
        "confidence": 0.95
      },
      "extracted_fields": {
        "document_number": {
          "field_name": "document_number",
          "value": "NLD123456789",
          "confidence": 0.92,
          "source": "ocr"
        },
        "surname": {
          "field_name": "surname",
          "value": "MULDER",
          "confidence": 0.96,
          "source": "ocr"
        }
      }
    }
  ]
}
```

## 🛠️ Deployment to Hugging Face Spaces

### Prerequisites
- [Hugging Face CLI](https://huggingface.co/docs/hub/install-huggingface-cli) installed
- Docker installed locally (for testing)

### 1. Create HF Space
```bash
# Login to Hugging Face
huggingface-cli login

# Create a new Docker Space
huggingface-cli repo create dots-ocr-idcard --type space --space_sdk docker --organization algoryn
```

### 2. Clone and Setup
```bash
# Clone the space locally
git clone https://huggingface.co/spaces/algoryn/dots-ocr-idcard
cd dots-ocr-idcard

# Copy required files
cp /path/to/kybtech-ml-pipelines/docker/hf/dots-ocr/* .

# Copy field extraction module
mkdir -p src/idcard_api
cp /path/to/kybtech-ml-pipelines/src/idcard_api/field_extraction.py src/idcard_api/
touch src/idcard_api/__init__.py
```

### 3. Deploy
```bash
git add .
git commit -m "Deploy Dots-OCR text extraction service"
git push
```

### 4. Test Deployment
The Space will be available at `https://algoryn-dots-ocr-idcard.hf.space` after deployment (usually 5-10 minutes).

## ⚙️ Configuration

### Environment Variables
- `HF_DOTS_MODEL_PATH`: Path to Dots.OCR model weights
- `HF_DOTS_CONFIDENCE_THRESHOLD`: Confidence threshold for field extraction
- `HF_DOTS_DEVICE`: Device to use (auto, cpu, cuda)
- `HF_DOTS_MAX_IMAGE_SIZE`: Maximum image size for processing
- `HF_DOTS_MRZ_ENABLED`: Enable MRZ detection

### Hugging Face Spaces Settings
- **SDK**: Docker
- **Port**: 7860 (default)
- **Hardware**: CPU (upgradeable to GPU)
- **Storage**: Persistent storage available for model caching

## 📊 Performance

| Hardware | Processing Time | Memory Usage |
|----------|----------------|--------------|
| **GPU** | 300-900ms | ~6GB |
| **CPU** | 3-8s | ~2GB |

## 🔒 Privacy & Security

- **No Data Storage**: Images are processed temporarily and not stored
- **Privacy Protection**: All field values are redacted in logs
- **Secure Processing**: Runs in isolated Docker containers
- **No Tracking**: No user data or usage analytics collected

## 🐳 Local Development

### Quick Start with uv
```bash
# Set up development environment
make setup

# Activate virtual environment
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate     # On Windows

# Run the application
make run-dev
```

### Docker Development
```bash
# Build and run with Docker
make build
make run-docker

# View logs
make logs
```

### Development Commands
```bash
# Run tests
make test

# Format code
make format

# Run linting
make lint

# Test API endpoints
make test-local
make test-production
```

For detailed development instructions, see the documentation in `docs/`.

## 📚 Documentation

- [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
- [Docker Spaces Guide](https://huggingface.co/docs/hub/spaces-sdks-docker)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## 📄 License

This project is licensed under a private license. See the license file for details.

## 🆘 Support

- **Issues**: Report bugs and request features via GitHub Issues
- **Discussions**: Join the community discussions
- **Email**: Contact us at website@huggingface.co for advanced support

---

Built with ❤️ using [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces) and FastAPI