sunflowerting78 commited on
Commit
a431606
·
verified ·
1 Parent(s): 4760b0e

add transformers usage of PaddleOCR-VL-0.09B

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -140,6 +140,51 @@ for res in output:
140
  ```
141
 
142
  **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  ## Performance
144
 
145
  ### Page-Level Document Parsing
 
140
  ```
141
 
142
  **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
143
+
144
+ ## PaddleOCR-VL-0.9B Usage with transformers
145
+
146
+
147
+ Currently, we support inference using the PaddleOCR-VL-0.9B model with the `transformers` library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with `transformers`. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with `transformers`. We currently recommend using the official method for inference, which is faster and can support page-level document parsing.
148
+
149
+
150
+ ```python
151
+ from PIL import Image
152
+ import torch
153
+ from transformers import AutoModelForCausalLM, AutoProcessor
154
+
155
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
156
+
157
+ CHOSEN_TASK = "ocr" # Options: 'ocr' | 'table' | 'chart' | 'formula'
158
+ PROMPTS = {
159
+ "ocr": "OCR:",
160
+ "table": "Table Recognition:",
161
+ "formula": "Formula Recognition:",
162
+ "chart": "Chart Recognition:",
163
+ }
164
+
165
+ model_path = "PaddlePaddle/PaddleOCR-VL"
166
+ image_path = "test.png"
167
+ image = Image.open(image_path).convert("RGB")
168
+
169
+ model = AutoModelForCausalLM.from_pretrained(
170
+ model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
171
+ ).to(DEVICE).eval()
172
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
173
+
174
+ messages = [{"role": "user", "content": PROMPTS[CHOSEN_TASK]}]
175
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
176
+
177
+ inputs = processor(text=[text], images=[image], return_tensors="pt")
178
+ inputs = {k: (v.to(DEVICE) if isinstance(v, torch.Tensor) else v) for k, v in inputs.items()}
179
+
180
+ with torch.inference_mode():
181
+ generated = model.generate(**inputs, max_new_tokens=1024, do_sample=False, use_cache=True)
182
+
183
+ resp = processor.batch_decode(generated, skip_special_tokens=True)[0]
184
+ answer = resp.split(text)[-1].strip()
185
+ print(answer)
186
+ ```
187
+
188
  ## Performance
189
 
190
  ### Page-Level Document Parsing