File size: 2,534 Bytes
c2530c8
 
 
 
 
 
 
 
 
 
 
 
 
2901927
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20edee7
2901927
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2530c8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: mit
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tag: text-to-video
tags:
- Real-Time
- Long-Video
- Video-Diffusion-Model
- Autoregressive
---
<p align="center">
<h1 align="center">Rolling Forcing</h1>
<h3 align="center">Autoregressive Long Video Diffusion in Real Time</h3>
</p>
<p align="center">
  <p align="center">
    <a href="https://kunhao-liu.github.io/">Kunhao Liu</a><sup>1</sup>

    <a href="https://wbhu.github.io/">Wenbo Hu</a><sup>2</sup>

    <a href="https://bluestyle97.github.io/">Jiale Xu</a><sup>2</sup>

    <a href="http://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>2</sup>

    <a href="https://personal.ntu.edu.sg/shijian.lu/">Shijian Lu</a><sup>1</sup><br>
    <sup>1</sup>Nanyang Technological University <sup>2</sup>ARC Lab, Tencent PCG
  </p>
  <h3 align="center"><a href="https://arxiv.org/abs/2509.25161"><img src="https://img.shields.io/badge/ArXiv-Paper-brown"></a> <a href="https://kunhao-liu.github.io/Rolling_Forcing_Webpage/"><img src="https://img.shields.io/badge/Project-Webpage-bron"></a> <a href="https://github.com/TencentARC/RollingForcing"><img src="https://img.shields.io/badge/GitHub-Code-blue"></a> <a href="https://huggingface.co/TencentARC/RollingForcing"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow"></a></h3>
</p>


## 馃挕 TL;DR: REAL-TIME streaming generation of MULTI-MINUTE videos
<img src="https://github.com/user-attachments/assets/194bd647-508c-4dba-9ee9-979b54a0e230" />

- 馃殌 Real-Time at 16 FPS: Stream high-quality video directly from text on a single GPU.
- 馃幀 Minute-Long Videos: Generate coherent, multi-minute sequences with dramatically reduced drift.
- 鈿欙笍 Rolling-Window Strategy: Denoise frames together in a rolling window for mutual refinement, breaking the chain of error accumulation.
- 馃 Long-Term Memory: The novel Attention Sink anchors your video, preserving global context over thousands of frames.
- 馃 State-of-the-Art Performance: Outperforms all comparable open-source models in quality and consistency.


## 馃摎 Citation

If you find this codebase useful for your research, please cite our paper and consider giving the repo a 猸愶笍 on GitHub: https://github.com/TencentARC/RollingForcing

```bibtex
@article{liu2025rolling,
  title={Rolling Forcing: Autoregressive Long Video Diffusion in Real Time},
  author={Liu, Kunhao and Hu, Wenbo and Xu, Jiale and Shan, Ying and Lu, Shijian},
  journal={arXiv preprint arXiv:2509.25161},
  year={2025}
}
```