| # granite-docling ONNX Conversion Guide | |
| ## Technical Reproduction Instructions | |
| This document provides complete instructions for reproducing the granite-docling ONNX conversion. | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - ~4GB available RAM | |
| - ~2GB disk space for conversion environment | |
| ### Step 1: Environment Setup | |
| ```bash | |
| # Create isolated environment | |
| python3 -m venv onnx_converter | |
| source onnx_converter/bin/activate # Linux/Mac | |
| # or onnx_converter\Scripts\activate # Windows | |
| # Install dependencies | |
| pip install torch torchvision transformers optimum[onnxruntime] safetensors | |
| ``` | |
| ### Step 2: Download Original Model | |
| ```bash | |
| # Download granite-docling SafeTensors model | |
| mkdir granite-docling-258m | |
| cd granite-docling-258m | |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/model.safetensors" -o model.safetensors | |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/config.json" -o config.json | |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/tokenizer.json" -o tokenizer.json | |
| curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/preprocessor_config.json" -o preprocessor_config.json | |
| ``` | |
| ### Step 3: Install IBM Experimental Fork | |
| ```bash | |
| # Clone IBM experimental optimum-onnx fork | |
| git clone https://github.com/gabe-l-hart/optimum-onnx.git | |
| cd optimum-onnx | |
| git checkout Idefics3Support | |
| # Install experimental fork | |
| pip install -e . --force-reinstall | |
| ``` | |
| ### Step 4: Convert to ONNX | |
| ```python | |
| import os | |
| import torch | |
| os.environ['CUDA_VISIBLE_DEVICES'] = '' # Force CPU | |
| from pathlib import Path | |
| from transformers import Idefics3ForConditionalGeneration | |
| from optimum.exporters.onnx import export | |
| from optimum.exporters.onnx.model_configs import Idefics3OnnxConfig | |
| # Load model | |
| model = Idefics3ForConditionalGeneration.from_pretrained( | |
| './granite-docling-258m', | |
| trust_remote_code=True, | |
| torch_dtype=torch.float32 | |
| ).to('cpu') | |
| # Create ONNX config | |
| onnx_config = Idefics3OnnxConfig(model.config, task='image-to-text') | |
| # Export to ONNX | |
| output_path = Path('./granite_docling.onnx') | |
| export(model, onnx_config, output_path, 17) | |
| print(f"ONNX conversion complete: {output_path}") | |
| ``` | |
| ### Expected Output | |
| ``` | |
| Initializing Idefics3ModelPatcher | |
| Entering Idefics3ModelPatcher context | |
| Patching Idefics3 model | |
| Using patched position embedding forward | |
| Exiting Idefics3ModelPatcher context | |
| ONNX conversion complete: granite_docling.onnx (1.2GB) | |
| ``` | |
| ### Validation | |
| ```python | |
| import onnxruntime as ort | |
| # Test ONNX model loading | |
| session = ort.InferenceSession('granite_docling.onnx') | |
| print("✅ ONNX model loads successfully") | |
| # Check input/output specifications | |
| for inp in session.get_inputs(): | |
| print(f"Input: {inp.name} - {inp.shape}") | |
| for out in session.get_outputs(): | |
| print(f"Output: {out.name} - {out.shape}") | |
| ``` | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **"Custom architecture" error**: Ensure using IBM experimental fork | |
| 2. **Memory errors**: Use CPU-only conversion (`CUDA_VISIBLE_DEVICES=''`) | |
| 3. **Import errors**: Verify experimental fork installed with `-e .` | |
| ### Technical Notes | |
| - **Conversion time**: 5-10 minutes on typical CPU | |
| - **Memory usage**: ~4GB RAM during conversion | |
| - **Warnings**: TracerWarnings are expected for complex VLM | |
| - **File size**: ONNX (~1.2GB) vs SafeTensors (~492MB) due to graph inclusion | |
| ## Attribution | |
| Original model: IBM Research granite-docling-258M | |
| Conversion method: IBM experimental Idefics3Support optimum-onnx fork | |
| Documentation: lamco-development |