glamberson commited on
Commit
7ee6acf
·
verified ·
1 Parent(s): 94ce360

Add technical conversion reproduction guide

Browse files
Files changed (1) hide show
  1. CONVERSION_GUIDE.md +125 -0
CONVERSION_GUIDE.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # granite-docling ONNX Conversion Guide
2
+
3
+ ## Technical Reproduction Instructions
4
+
5
+ This document provides complete instructions for reproducing the granite-docling ONNX conversion.
6
+
7
+ ### Prerequisites
8
+
9
+ - Python 3.10+
10
+ - ~4GB available RAM
11
+ - ~2GB disk space for conversion environment
12
+
13
+ ### Step 1: Environment Setup
14
+
15
+ ```bash
16
+ # Create isolated environment
17
+ python3 -m venv onnx_converter
18
+ source onnx_converter/bin/activate # Linux/Mac
19
+ # or onnx_converter\Scripts\activate # Windows
20
+
21
+ # Install dependencies
22
+ pip install torch torchvision transformers optimum[onnxruntime] safetensors
23
+ ```
24
+
25
+ ### Step 2: Download Original Model
26
+
27
+ ```bash
28
+ # Download granite-docling SafeTensors model
29
+ mkdir granite-docling-258m
30
+ cd granite-docling-258m
31
+
32
+ curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/model.safetensors" -o model.safetensors
33
+ curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/config.json" -o config.json
34
+ curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/tokenizer.json" -o tokenizer.json
35
+ curl -L "https://huggingface.co/ibm-granite/granite-docling-258M/resolve/main/preprocessor_config.json" -o preprocessor_config.json
36
+ ```
37
+
38
+ ### Step 3: Install IBM Experimental Fork
39
+
40
+ ```bash
41
+ # Clone IBM experimental optimum-onnx fork
42
+ git clone https://github.com/gabe-l-hart/optimum-onnx.git
43
+ cd optimum-onnx
44
+ git checkout Idefics3Support
45
+
46
+ # Install experimental fork
47
+ pip install -e . --force-reinstall
48
+ ```
49
+
50
+ ### Step 4: Convert to ONNX
51
+
52
+ ```python
53
+ import os
54
+ import torch
55
+ os.environ['CUDA_VISIBLE_DEVICES'] = '' # Force CPU
56
+
57
+ from pathlib import Path
58
+ from transformers import Idefics3ForConditionalGeneration
59
+ from optimum.exporters.onnx import export
60
+ from optimum.exporters.onnx.model_configs import Idefics3OnnxConfig
61
+
62
+ # Load model
63
+ model = Idefics3ForConditionalGeneration.from_pretrained(
64
+ './granite-docling-258m',
65
+ trust_remote_code=True,
66
+ torch_dtype=torch.float32
67
+ ).to('cpu')
68
+
69
+ # Create ONNX config
70
+ onnx_config = Idefics3OnnxConfig(model.config, task='image-to-text')
71
+
72
+ # Export to ONNX
73
+ output_path = Path('./granite_docling.onnx')
74
+ export(model, onnx_config, output_path, 17)
75
+
76
+ print(f"ONNX conversion complete: {output_path}")
77
+ ```
78
+
79
+ ### Expected Output
80
+
81
+ ```
82
+ Initializing Idefics3ModelPatcher
83
+ Entering Idefics3ModelPatcher context
84
+ Patching Idefics3 model
85
+ Using patched position embedding forward
86
+ Exiting Idefics3ModelPatcher context
87
+ ONNX conversion complete: granite_docling.onnx (1.2GB)
88
+ ```
89
+
90
+ ### Validation
91
+
92
+ ```python
93
+ import onnxruntime as ort
94
+
95
+ # Test ONNX model loading
96
+ session = ort.InferenceSession('granite_docling.onnx')
97
+ print("✅ ONNX model loads successfully")
98
+
99
+ # Check input/output specifications
100
+ for inp in session.get_inputs():
101
+ print(f"Input: {inp.name} - {inp.shape}")
102
+ for out in session.get_outputs():
103
+ print(f"Output: {out.name} - {out.shape}")
104
+ ```
105
+
106
+ ## Troubleshooting
107
+
108
+ ### Common Issues
109
+
110
+ 1. **"Custom architecture" error**: Ensure using IBM experimental fork
111
+ 2. **Memory errors**: Use CPU-only conversion (`CUDA_VISIBLE_DEVICES=''`)
112
+ 3. **Import errors**: Verify experimental fork installed with `-e .`
113
+
114
+ ### Technical Notes
115
+
116
+ - **Conversion time**: 5-10 minutes on typical CPU
117
+ - **Memory usage**: ~4GB RAM during conversion
118
+ - **Warnings**: TracerWarnings are expected for complex VLM
119
+ - **File size**: ONNX (~1.2GB) vs SafeTensors (~492MB) due to graph inclusion
120
+
121
+ ## Attribution
122
+
123
+ Original model: IBM Research granite-docling-258M
124
+ Conversion method: IBM experimental Idefics3Support optimum-onnx fork
125
+ Documentation: lamco-development