Spaces:
Running
Running
Update README.md
Browse files

README.md
CHANGED
|
@@ -7,7 +7,51 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
|
| 11 |
In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
|
| 12 |
We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
|
| 13 |
-
Check out our preprint [coming soon], and feel free to contact us at soarhigh@kaist.ac.kr.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
|
| 11 |
+
==============================================================================================================
|
| 12 |
+
Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
|
| 13 |
+
|
| 14 |
+
## 📄 [Paper](URL) 💻 [Code](URL)
|
| 15 |
+
|
| 16 |
Hello, we are a team of researchers based in KAIST AI working on accessible visualization.
|
| 17 |
In specific, we compiled a diagram description dataset for the blind and low-vision individuals.
|
| 18 |
We worked in close cooperation with two schools for the blind, as well as over 30 sighted annotators, and we are grateful for their contribution.
|
| 19 |
+
Check out our preprint [coming soon], and feel free to contact us at soarhigh@kaist.ac.kr.
|
| 20 |
+
|
| 21 |
+
---------------------------------------
|
| 22 |
+
|
| 23 |
+
## Abstract
|
| 24 |
+
> Often, the needs and visual abilities differ between the annotator group and the end user
|
| 25 |
+
group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain.
|
| 26 |
+
Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat
|
| 27 |
+
lacking by BLV standards. In this study, we ask sighted individuals to assess—rather than produce—diagram descriptions generated by vision-language models (VLM) that have been
|
| 28 |
+
guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators
|
| 29 |
+
who are themselves BLV and teach visually impaired learners. We release SIGHTATION, a collection of diagram description datasets
|
| 30 |
+
spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and
|
| 31 |
+
demonstrate their fine-tuning potential in various downstream tasks.
|
| 32 |
+
|
| 33 |
+
## Sightation Collection
|
| 34 |
+
- SightationCompletions
|
| 35 |
+
- SightationPreference
|
| 36 |
+
- SightationRetrieval
|
| 37 |
+
- SightationVQA
|
| 38 |
+
- SightationReasoning
|
| 39 |
+
|
| 40 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/cNshK4QAdiNMqk7x6J6j7.png" width="70%" height="70%" title="visual_abstract" alt="visual_abstract"></img>
|
| 41 |
+
The key benefit of utilizing sighted user feedback lies in their assessments that are based on solid visual
|
| 42 |
+
grounding. The compiled assessments prove an effective training substance for steering VLMs towards more
|
| 43 |
+
accessible descriptions.
|
| 44 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/8oYvtq7dtv_Ck-U6OlcAE.png" width="50%" height="50%" title="dimensions_assignment" alt="dimensions_assignment"></img>
|
| 45 |
+
The description qualities assessed by their respective evaluator groups.
|
| 46 |
+
|
| 47 |
+
## Results
|
| 48 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a86f66c6f66e2fa5888b41/094e9Hw7lauvT1tshg1Wj.png" width="60%" height="60%" title="spider_chart" alt="spider_chart"></img>
|
| 49 |
+
Tuning VLMs on Sightation enhanced various qualities of the diagram descriptions, evaluated by BLV educators, and shown here as normalized ratings averaged in each aspect.
|
| 50 |
+
The capability of the dataset is most strongly pronounced with Qwen2-VL-2B model, shown above.
|
| 51 |
+
|
| 52 |
+
## BibTeX
|
| 53 |
+
If you find this work useful for your research, please cite:
|
| 54 |
+
```bash
|
| 55 |
+
@inproceedings{
|
| 56 |
+
}
|
| 57 |
+
```
|