|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Wan-AI/Wan2.1-T2V-1.3B |
|
|
pipeline_tag: text-to-video |
|
|
tags: |
|
|
- Real-Time |
|
|
- Long-Video |
|
|
- Video-Diffusion-Model |
|
|
- Autoregressive |
|
|
--- |
|
|
<p align="center"> |
|
|
<h1 align="center">Rolling Forcing</h1> |
|
|
<h3 align="center">Autoregressive Long Video Diffusion in Real Time</h3> |
|
|
</p> |
|
|
<p align="center"> |
|
|
<p align="center"> |
|
|
<a href="https://kunhao-liu.github.io/">Kunhao Liu</a><sup>1</sup> |
|
|
路 |
|
|
<a href="https://wbhu.github.io/">Wenbo Hu</a><sup>2</sup> |
|
|
路 |
|
|
<a href="https://bluestyle97.github.io/">Jiale Xu</a><sup>2</sup> |
|
|
路 |
|
|
<a href="http://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>2</sup> |
|
|
路 |
|
|
<a href="https://personal.ntu.edu.sg/shijian.lu/">Shijian Lu</a><sup>1</sup><br> |
|
|
<sup>1</sup>Nanyang Technological University <sup>2</sup>ARC Lab, Tencent PCG |
|
|
</p> |
|
|
<h3 align="center"><a href="https://arxiv.org/abs/2509.25161"><img src="https://img.shields.io/badge/ArXiv-Paper-brown"></a> <a href="https://kunhao-liu.github.io/Rolling_Forcing_Webpage/"><img src="https://img.shields.io/badge/Project-Webpage-bron"></a> <a href="https://github.com/TencentARC/RollingForcing"><img src="https://img.shields.io/badge/GitHub-Code-blue"></a> <a href="https://huggingface.co/TencentARC/RollingForcing"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow"></a></h3> |
|
|
</p> |
|
|
|
|
|
|
|
|
## 馃挕 TL;DR: REAL-TIME streaming generation of MULTI-MINUTE videos |
|
|
<img src="https://github.com/user-attachments/assets/194bd647-508c-4dba-9ee9-979b54a0e230" /> |
|
|
|
|
|
- 馃殌 Real-Time at 16 FPS: Stream high-quality video directly from text on a single GPU. |
|
|
- 馃幀 Minute-Long Videos: Generate coherent, multi-minute sequences with dramatically reduced drift. |
|
|
- 鈿欙笍 Rolling-Window Strategy: Denoise frames together in a rolling window for mutual refinement, breaking the chain of error accumulation. |
|
|
- 馃 Long-Term Memory: The novel Attention Sink anchors your video, preserving global context over thousands of frames. |
|
|
- 馃 State-of-the-Art Performance: Outperforms all comparable open-source models in quality and consistency. |
|
|
|
|
|
|
|
|
## 馃摎 Citation |
|
|
|
|
|
If you find this codebase useful for your research, please cite our paper and consider giving the repo a 猸愶笍 on GitHub: https://github.com/TencentARC/RollingForcing |
|
|
|
|
|
```bibtex |
|
|
@article{liu2025rolling, |
|
|
title={Rolling Forcing: Autoregressive Long Video Diffusion in Real Time}, |
|
|
author={Liu, Kunhao and Hu, Wenbo and Xu, Jiale and Shan, Ying and Lu, Shijian}, |
|
|
journal={arXiv preprint arXiv:2509.25161}, |
|
|
year={2025} |
|
|
} |
|
|
``` |