5 10 8

Dongxu Li

dxli1

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

upvoted a paper about 1 month ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

authored a paper about 1 month ago

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

View all activity

Organizations

upvoted a paper 5 days ago

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 54

upvoted a paper about 1 month ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26 • 127

authored a paper about 1 month ago

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 54

upvoted a paper about 1 month ago

AToken: A Unified Tokenizer for Vision

Paper • 2509.14476 • Published Sep 17 • 36

liked a model 6 months ago

reducto/RolmOCR

Image-to-Text • 8B • Updated Apr 2 • 37.6k • 560

upvoted a paper 8 months ago

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

Paper • 2503.06885 • Published Mar 10 • 4

commented a paper 8 months ago

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

Paper • 2503.06885 • Published Mar 10 • 4 •

liked a Space 11 months ago

109

Vision Papers

💻

All paper summaries read by Merve

upvoted a paper 11 months ago

OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published Nov 22, 2024 • 61

upvoted a paper 12 months ago

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Paper • 2411.13281 • Published Nov 20, 2024 • 21

liked 2 models about 1 year ago

rhymes-ai/Aria-torchao-int8wo

Updated Oct 25, 2024 • 1 • 12

leon-se/Aria-sequential_mlp-FP8-dynamic

Image-Text-to-Text • 25B • Updated Dec 29, 2024 • 2 • 6

upvoted an article about 1 year ago

Article

Allegro: Advanced Video Generation Model

•

Oct 22, 2024

• 60

liked a model about 1 year ago

rhymes-ai/Allegro

Text-to-Video • Updated Oct 31, 2024 • 102 • 264

upvoted a paper about 1 year ago

Allegro: Open the Black Box of Commercial-Level Video Generation Model

Paper • 2410.15458 • Published Oct 20, 2024 • 40

liked a model about 1 year ago

rhymes-ai/Aria

Image-Text-to-Text • 25B • Updated Apr 23 • 39k • 636

upvoted a paper about 1 year ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111

authored 3 papers about 1 year ago

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Paper • 1910.11006 • Published Oct 24, 2019

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Paper • 2311.18799 • Published Nov 30, 2023 • 1

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions

Paper • 2401.01827 • Published Jan 3, 2024 • 18

Dongxu Li

AI & ML interests

Recent Activity

Organizations

dxli1's activity

Vision Papers

Allegro: Advanced Video Generation Model