FG-CLIP 2 Collection FG-CLIP 2 is the foundation model for fine-grained vision-language understanding in both English and Chinese. • 10 items • Updated 19 days ago • 4
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Paper • 2510.10921 • Published 21 days ago • 9
FG-CLIP Collection New generation of CLIP with strong fine grained discrimination capability • 6 items • Updated 19 days ago • 4
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published Sep 6, 2024 • 19