opensporks commited on
Commit
cc65656
·
verified ·
1 Parent(s): 7629c87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -33,6 +33,46 @@ We're releasing these models in two different sizes:
33
  - **Input**: Cleaned or raw HTML and a JSON Schema
34
  - **Output**: Strict JSON that conforms to the provided schema
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ## Minimal Quickstart
37
  Use these local snippets to prepare HTML and compose a schema‑guided prompt. The model returns strictly valid JSON; validate it against your schema downstream.
38
 
 
33
  - **Input**: Cleaned or raw HTML and a JSON Schema
34
  - **Output**: Strict JSON that conforms to the provided schema
35
 
36
+ ## Benchmarks
37
+
38
+ ### HTML-to-JSON Extraction Quality
39
+
40
+ We evaluated extraction quality using Gemini 2.5 Pro as a judge, scoring extractions from 1-5 where 5 represents perfect extraction.
41
+
42
+ | Model | LLM-as-Judge Score |
43
+ |-------|-------------------|
44
+ | GPT-4.1 | 4.74 |
45
+ | **Schematron-8B** | **4.64** |
46
+ | **Schematron-3B** | **4.41** |
47
+ | Gemini-3B-Base | 2.24 |
48
+
49
+ ### Web-Augmented Factuality on SimpleQA
50
+
51
+ We evaluated Schematron's real-world impact on LLM factuality using SimpleQA.
52
+
53
+ **Test Pipeline:**
54
+ 1. **Query Generation**: Primary LLM (GPT-5 Nano or GPT-4.1) generates search queries and defines extraction schema
55
+ 2. **Web Search**: Search provider (SERP or Exa) retrieves relevant pages
56
+ 3. **Structured Extraction**: Schematron extracts JSON data from retrieved pages using the schema
57
+ 4. **Answer Synthesis**: Primary LLM produces final answer from structured data
58
+
59
+ | Base Model | Configuration | SimpleQA Accuracy |
60
+ |:-----------|:--------------|------------------:|
61
+ | GPT-5 Nano | Solo | 8.54% |
62
+ | GPT-5 Nano | + SERP + Schematron-8B | 64.15% |
63
+ | GPT-5 Nano | + Exa + **Schematron-3B** | **75.47%** |
64
+ | GPT-5 Nano | + Exa + Gemini 2.5 Flash | 80.61% |
65
+ | GPT-5 Nano | + Exa + **Schematron-8B** | **82.87%** |
66
+ | GPT-4.1 | Solo | 41.60% |
67
+ | GPT-4.1 | + Exa + **Schematron-8B** | **85.58%** |
68
+
69
+ **Key findings:**
70
+ - Web search paired with JSON extraction improves factuality: Adding Schematron with web retrieval improves GPT-5 Nano's accuracy from 8.54% to 82.87%—nearly a 10x improvement
71
+ - Search provider matters: Exa (82.9%) significantly outperforms SERP (64.2%) for factual retrieval, while also being more cost-effective
72
+ - Structured extraction beats raw HTML: Processing raw HTML would require 100k+ tokens for 10 searches; Schematron's JSON extraction reduces this by orders of magnitude
73
+ - Small specialized models win: Schematron-8B (82.87%) outperforms the much larger Gemini 2.5 Flash (80.61%) on this task, showing that fine-tuning for well-defined tasks beats general purpose models
74
+ - Performance scales with model quality: When paired with GPT-4.1, Schematron achieves 85.58% accuracy, showing the approach benefits from stronger base models
75
+
76
  ## Minimal Quickstart
77
  Use these local snippets to prepare HTML and compose a schema‑guided prompt. The model returns strictly valid JSON; validate it against your schema downstream.
78