---
license: mit
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tag: text-to-video
tags:
- Real-Time
- Long-Video
- Video-Diffusion-Model
- Autoregressive
---
Rolling Forcing
Autoregressive Long Video Diffusion in Real Time
Kunhao Liu1
·
Wenbo Hu2
·
Jiale Xu2
·
Ying Shan2
·
Shijian Lu1
1Nanyang Technological University 2ARC Lab, Tencent PCG

## 💡 TL;DR: REAL-TIME streaming generation of MULTI-MINUTE videos
- 🚀 Real-Time at 16 FPS: Stream high-quality video directly from text on a single GPU.
- 🎬 Minute-Long Videos: Generate coherent, multi-minute sequences with dramatically reduced drift.
- ⚙️ Rolling-Window Strategy: Denoise frames together in a rolling window for mutual refinement, breaking the chain of error accumulation.
- 🧠 Long-Term Memory: The novel Attention Sink anchors your video, preserving global context over thousands of frames.
- 🥇 State-of-the-Art Performance: Outperforms all comparable open-source models in quality and consistency.
## 📚 Citation
If you find this codebase useful for your research, please cite our paper and consider giving the repo a ⭐️ on GitHub: https://github.com/TencentARC/RollingForcing
```bibtex
@article{liu2025rolling,
title={Rolling Forcing: Autoregressive Long Video Diffusion in Real Time},
author={Liu, Kunhao and Hu, Wenbo and Xu, Jiale and Shan, Ying and Lu, Shijian},
journal={arXiv preprint arXiv:2509.25161},
year={2025}
}
```