ChengCui commited on
Commit
b5355e1
·
verified ·
1 Parent(s): f9d60ba

chat_template (#49)

Browse files

- update chat_template (59a09102ed3bd9eddcb6f0cf1152227c65651edd)

Files changed (2) hide show
  1. README.md +52 -0
  2. chat_template.jinja +28 -4
README.md CHANGED
@@ -141,6 +141,58 @@ for res in output:
141
 
142
  **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
143
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
  ## Performance
145
 
146
  ### Page-Level Document Parsing
 
141
 
142
  **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
143
 
144
+ ## PaddleOCR-VL-0.9B Usage with transformers
145
+
146
+ Currently, we support inference using the PaddleOCR-VL-0.9B model with the `transformers` library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with `transformers`. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with `transformers`.
147
+
148
+ > [!NOTE]
149
+ > Note: We currently recommend using the official method for inference, as it is faster and supports page-level document parsing. The example code below only supports element-level recognition.
150
+
151
+ ```python
152
+ from PIL import Image
153
+ import torch
154
+ from transformers import AutoModelForCausalLM, AutoProcessor
155
+
156
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
157
+
158
+ CHOSEN_TASK = "ocr" # Options: 'ocr' | 'table' | 'chart' | 'formula'
159
+ PROMPTS = {
160
+ "ocr": "OCR:",
161
+ "table": "Table Recognition:",
162
+ "formula": "Formula Recognition:",
163
+ "chart": "Chart Recognition:",
164
+ }
165
+
166
+ model_path = "PaddlePaddle/PaddleOCR-VL"
167
+ image_path = "test.png"
168
+ image = Image.open(image_path).convert("RGB")
169
+
170
+ model = AutoModelForCausalLM.from_pretrained(
171
+ model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
172
+ ).to(DEVICE).eval()
173
+ processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
174
+
175
+ messages = [
176
+ {"role": "user",
177
+ "content": [
178
+ {"type": "image", "image": image},
179
+ {"type": "text", "text": PROMPTS[CHOSEN_TASK]},
180
+ ]
181
+ }
182
+ ]
183
+ inputs = processor.apply_chat_template(
184
+ messages,
185
+ tokenize=True,
186
+ add_generation_prompt=True,
187
+ return_dict=True,
188
+ return_tensors="pt"
189
+ ).to(DEVICE)
190
+
191
+ outputs = model.generate(**inputs, max_new_tokens=1024)
192
+ outputs = processor.batch_decode(outputs, skip_special_tokens=True)[0]
193
+ print(outputs)
194
+ ```
195
+
196
  ## Performance
197
 
198
  ### Page-Level Document Parsing
chat_template.jinja CHANGED
@@ -7,16 +7,40 @@
7
  {%- if not sep_token is defined -%}
8
  {%- set sep_token = "<|end_of_sentence|>" -%}
9
  {%- endif -%}
 
 
 
10
  {{- cls_token -}}
11
  {%- for message in messages -%}
12
  {%- if message["role"] == "user" -%}
13
- {{- "User: <|IMAGE_START|><|IMAGE_PLACEHOLDER|><|IMAGE_END|>" + message["content"] + "\n" -}}
 
 
 
 
 
 
 
 
 
 
 
14
  {%- elif message["role"] == "assistant" -%}
15
- {{- "Assistant: " + message["content"] + sep_token -}}
 
 
 
 
 
 
16
  {%- elif message["role"] == "system" -%}
17
- {{- message["content"] -}}
 
 
 
 
18
  {%- endif -%}
19
  {%- endfor -%}
20
  {%- if add_generation_prompt -%}
21
  {{- "Assistant: " -}}
22
- {%- endif -%}
 
7
  {%- if not sep_token is defined -%}
8
  {%- set sep_token = "<|end_of_sentence|>" -%}
9
  {%- endif -%}
10
+ {%- if not image_token is defined -%}
11
+ {%- set image_token = "<|IMAGE_START|><|IMAGE_PLACEHOLDER|><|IMAGE_END|>" -%}
12
+ {%- endif -%}
13
  {{- cls_token -}}
14
  {%- for message in messages -%}
15
  {%- if message["role"] == "user" -%}
16
+ {{- "User: " -}}
17
+ {%- for content in message["content"] -%}
18
+ {%- if content["type"] == "image" -%}
19
+ {{ image_token }}
20
+ {%- endif -%}
21
+ {%- endfor -%}
22
+ {%- for content in message["content"] -%}
23
+ {%- if content["type"] == "text" -%}
24
+ {{ content["text"] }}
25
+ {%- endif -%}
26
+ {%- endfor -%}
27
+ {{ "\n" -}}
28
  {%- elif message["role"] == "assistant" -%}
29
+ {{- "Assistant: " -}}
30
+ {%- for content in message["content"] -%}
31
+ {%- if content["type"] == "text" -%}
32
+ {{ content["text"] + "\n" }}
33
+ {%- endif -%}
34
+ {%- endfor -%}
35
+ {{ sep_token -}}
36
  {%- elif message["role"] == "system" -%}
37
+ {%- for content in message["content"] -%}
38
+ {%- if content["type"] == "text" -%}
39
+ {{ content["text"] + "\n" }}
40
+ {%- endif -%}
41
+ {%- endfor -%}
42
  {%- endif -%}
43
  {%- endfor -%}
44
  {%- if add_generation_prompt -%}
45
  {{- "Assistant: " -}}
46
+ {%- endif -%}