CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20 • 18
view article Article PP-OCRv5 on Hugging Face: A Specialized Approach to OCR By baidu and 5 others • Sep 10 • 108
PP-OCRv5 Collection PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15 • 48
view article Article Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub By nvidia and 11 others • Jun 27 • 29
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 166
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! Jun 6 • 54
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10 • 48
AGUVIS: Unified Pure Vision GUI Agents Collection https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 7
MiniCPM-o & MiniCPM-V Collection Multimodal models with leading performance. • 28 items • Updated Sep 1 • 56