4DGT Model Card

Model Details

4DGT (4D Gaussian Transformer) is a neural network model that learns dynamic 3D Gaussian representations from monocular videos. It uses a transformer-based architecture to predict 4D Gaussians from a dynamic scenes observed from an egocentric video.

Paper: 4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos
Project Page: https://4dgt.github.io/
Github: GitHub repository

Please refer to the project page and github for more details of the model.

Citation

@inproceedings{xu20254dgt,
    title     = {4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos},
    author    = {Xu, Zhen and Li, Zhengqin and Dong, Zhao and Zhou, Xiaowei and Newcombe, Richard and Lv, Zhaoyang},
    journal   = {arXiv preprint arXiv:2506.08015},
    year      = {2025}
}

Model Files

Checkpoint: `4dgt_full.pth`

Size: ~14.5 GB
Format: PyTorch state dict
Contents:
- The full model trained as described in the paper.
- Encoder weights (DINOv2 backbone)
- Level of Details Transformer
- 4D Gaussian Decoder

Checkpoint: `4dgt_1st_stage.pth`

Size: ~4.85 GB
Format: PyTorch state dict
Contents:
- The first stage model trained only using Egoexo4D dataset as described in the paper.
- Encoder weights (DINOv2 backbone)
- Vanilla Transformer, no level of details.
- 4D Gaussian Decoder

Quick Start

Please refer to 4DGT GitHub repository for the full set up.

Contact

For questions and issues, please open an issue on the GitHub repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support