Li Dong's picture

Li Dong

unilm

·

AI & ML interests

Language Model Pre-Training

Recent Activity

upvoted a paper 5 days ago

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

upvoted a paper 5 days ago

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

upvoted a paper 5 days ago

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

View all activity

Organizations

upvoted 3 papers 5 days ago

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published 7 days ago • 98

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Paper • 2510.19808 • Published 6 days ago • 24

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Paper • 2510.19779 • Published 6 days ago • 58

upvoted 3 papers 7 days ago

BitNet Distillation

Paper • 2510.13998 • Published 13 days ago • 51

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published 9 days ago • 59

QueST: Incentivizing LLMs to Generate Difficult Problems

Paper • 2510.17715 • Published 8 days ago • 31

upvoted 2 papers 14 days ago

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Paper • 2510.06499 • Published 21 days ago • 31

DocReward: A Document Reward Model for Structuring and Stylizing

Paper • 2510.11391 • Published 15 days ago • 26

upvoted 3 papers 18 days ago

Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published 28 days ago • 49

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published 22 days ago • 454

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published 25 days ago • 45

upvoted a paper 26 days ago

RLP: Reinforcement as a Pretraining Objective

Paper • 2510.01265 • Published Sep 26 • 39

upvoted 3 papers 30 days ago

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24 • 75

Thinking Augmented Pre-training

Paper • 2509.20186 • Published Sep 24 • 22

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 176

upvoted a paper about 2 months ago

Fantastic Pretraining Optimizers and Where to Find Them

Paper • 2509.02046 • Published Sep 2 • 12

upvoted a paper 2 months ago

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

upvoted a collection 2 months ago

VibeVoice

Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 5 items • Updated Sep 1 • 129

upvoted 2 papers 2 months ago

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20 • 36

DINOv3

Paper • 2508.10104 • Published Aug 13 • 274