dots-ocr-idcard / scripts /README_TESTING.md
tommulder's picture
chore(test): commit Makefile and testing scripts
34c6057

Dots.OCR API Testing

This directory contains comprehensive testing scripts for the Dots.OCR API endpoint.

Test Scripts

1. test_api_endpoint.py - Comprehensive API Testing

The main testing script that provides full API validation capabilities.

Features:

  • Health check validation
  • Single and multiple image testing
  • ROI (Region of Interest) testing
  • Field extraction validation
  • Response structure validation
  • Performance metrics
  • Detailed error reporting

Usage:

# Basic test with default settings
python test_api_endpoint.py

# Test with custom API URL
python test_api_endpoint.py --url https://your-api.example.com

# Test with ROI
python test_api_endpoint.py --roi '{"x1": 0.1, "y1": 0.1, "x2": 0.9, "y2": 0.9}'

# Test with specific expected fields
python test_api_endpoint.py --expected-fields document_number surname given_names

# Verbose output
python test_api_endpoint.py --verbose

# Custom timeout
python test_api_endpoint.py --timeout 60

Options:

  • --url: API base URL (default: http://localhost:7860)
  • --timeout: Request timeout in seconds (default: 30)
  • --roi: ROI coordinates as JSON string
  • --expected-fields: List of expected field names to validate
  • --verbose: Enable verbose logging

2. quick_test.py - Quick Validation

A simple script for quick API validation after deployment.

Usage:

# Test local API
python quick_test.py

# Test remote API
python quick_test.py https://your-api.example.com

Test Configuration

test_config.json

Configuration file for test parameters and thresholds.

Configuration sections:

  • api_endpoints: Different API URLs for various environments
  • test_images: List of test image files
  • expected_fields: Fields that should be extracted
  • roi_test_cases: Different ROI configurations to test
  • performance_thresholds: Performance validation criteria
  • test_timeout: Default timeout for requests

Test Images

The following test images are used for validation:

  • tom_id_card_front.jpg - Front of Dutch ID card
  • tom_id_card_back.jpg - Back of Dutch ID card

Testing Scenarios

1. Basic Functionality Test

python test_api_endpoint.py

Tests basic API functionality with default settings.

2. ROI Testing

python test_api_endpoint.py --roi '{"x1": 0.25, "y1": 0.25, "x2": 0.75, "y2": 0.75}'

Tests Region of Interest cropping functionality.

3. Field Validation Test

python test_api_endpoint.py --expected-fields document_number surname given_names nationality

Tests that specific fields are extracted correctly.

4. Performance Test

python test_api_endpoint.py --timeout 60 --verbose

Tests API performance with extended timeout and detailed logging.

Expected Results

Successful Test Output

πŸ” Checking API health...
βœ… API is healthy: {'status': 'healthy', 'version': '1.0.0', 'model_loaded': True}
πŸš€ Starting API tests with 2 images...
βœ… tom_id_card_front.jpg: 2.45s
βœ… tom_id_card_back.jpg: 1.23s
πŸ“Š Test Results:
   Total images: 2
   Successful: 2
   Failed: 0
   Success rate: 100.0%
   Average processing time: 1.84s
πŸŽ‰ All tests completed successfully!

Field Extraction Example

Page 1: 11 fields extracted
  document_number: NLD123456789 (confidence: 0.90)
  surname: MULDER (confidence: 0.90)
  given_names: THOMAS JAN (confidence: 0.90)
  nationality: NLD (confidence: 0.95)
  date_of_birth: 15-03-1990 (confidence: 0.90)
  gender: M (confidence: 0.95)

Troubleshooting

Common Issues

  1. Connection Refused

    • Check if the API is running
    • Verify the correct URL and port
    • Check firewall settings
  2. Timeout Errors

    • Increase timeout with --timeout parameter
    • Check API performance and resource usage
  3. Missing Fields

    • Verify test images contain the expected text
    • Check field extraction patterns in the code
    • Review API logs for processing errors
  4. Validation Errors

    • Check API response format
    • Verify model is loaded correctly
    • Review error logs for details

Debug Mode

Enable verbose logging for detailed debugging:

python test_api_endpoint.py --verbose

Integration with CI/CD

The test scripts can be integrated into CI/CD pipelines:

# Example GitHub Actions step
- name: Test API Endpoint
  run: |
    python scripts/test_api_endpoint.py --url ${{ env.API_URL }} --timeout 60

Performance Monitoring

The scripts provide performance metrics that can be used for monitoring:

  • Processing time per image
  • Success rate
  • Field extraction accuracy
  • Response validation results

These metrics can be integrated with monitoring systems like Prometheus or DataDog.

πŸš€ Production API Testing

Current Production Endpoint

Quick Production Test

# Test production API
./run_tests.sh -e production

# Quick test with curl (no Python dependencies)
./test_production_curl.sh

Staging Environment

Environment-Specific Testing

# Test different environments
./run_tests.sh -e local      # Local development
./run_tests.sh -e staging    # Staging environment
./run_tests.sh -e production # Production environment

5. test_debug_ocr.sh - Per-request debug logging via curl

Use this for quick, dependency-light testing of the server-side debug mode that prints OCR snippets, extracted fields, and MRZ details to logs.

Usage:

# Local server (per-request debug on)
./test_debug_ocr.sh -u http://localhost:7860 -f tom_id_card_front.jpg -d

# Hugging Face Space (replace with your Space URL)
./test_debug_ocr.sh -u https://<your-space>.hf.space -f tom_id_card_front.jpg -d \
  -r '{"x1":0,"y1":0,"x2":1,"y2":0.5}'

You can also enable debug globally on the server with DOTS_OCR_DEBUG=1. The script only toggles the request-level flag via -d.