--- license: mit language: - en base_model: - Wan-AI/Wan2.1-T2V-1.3B pipeline_tag: text-to-video tags: - Real-Time - Long-Video - Video-Diffusion-Model - Autoregressive ---

Rolling Forcing

Autoregressive Long Video Diffusion in Real Time

Kunhao Liu¹ · Wenbo Hu² · Jiale Xu² · Ying Shan² · Shijian Lu¹
¹Nanyang Technological University ²ARC Lab, Tencent PCG

## 💡 TL;DR: REAL-TIME streaming generation of MULTI-MINUTE videos

- 🚀 Real-Time at 16 FPS: Stream high-quality video directly from text on a single GPU. - 🎬 Minute-Long Videos: Generate coherent, multi-minute sequences with dramatically reduced drift. - ⚙️ Rolling-Window Strategy: Denoise frames together in a rolling window for mutual refinement, breaking the chain of error accumulation. - 🧠 Long-Term Memory: The novel Attention Sink anchors your video, preserving global context over thousands of frames. - 🥇 State-of-the-Art Performance: Outperforms all comparable open-source models in quality and consistency. ## 📚 Citation If you find this codebase useful for your research, please cite our paper and consider giving the repo a ⭐️ on GitHub: https://github.com/TencentARC/RollingForcing ```bibtex @article{liu2025rolling, title={Rolling Forcing: Autoregressive Long Video Diffusion in Real Time}, author={Liu, Kunhao and Hu, Wenbo and Xu, Jiale and Shan, Ying and Lu, Shijian}, journal={arXiv preprint arXiv:2509.25161}, year={2025} } ```