Spaces:

mooki0
/

HunyuanWorld-Demo

Build error

App Files Files Community

HunyuanWorld-Demo / README.md

mooki0

Fix Dockerfile and use Docker SDK

6e212a0 verified 4 months ago

preview code

raw

history blame contribute delete

10.1 kB

	---
	title: HunyuanWorld Demo
	emoji: 🌍
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	pinned: false
	license: other
	models:
	- black-forest-labs/FLUX.1-dev
	- tencent/HunyuanWorld-1
	hardware: nvidia-t4-small
	---

	# HunyuanWorld-1.0 Demo Space

	This is a Gradio demo for [Tencent-Hunyuan/HunyuanWorld-1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), a one-stop solution for text-driven 3D scene generation.

	## How to Use

	1. Panorama Generation:
	- Text-to-Panorama: Enter a text prompt and generate a 360° panorama image.
	- Image-to-Panorama: Upload an image and provide a prompt to extend it into a panorama.
	2. Scene Generation:
	- After generating a panorama, click "Send to Scene Generation".
	- Provide labels for foreground objects to be separated into layers.
	- Click "Generate 3D Scene" to create a 3D mesh from the panorama.

	## Technical Details

	This space combines two core functionalities of the HunyuanWorld-1.0 model:

	- Panorama Generation: Creates immersive 360° images from text or existing images.
	- 3D Scene Reconstruction: Decomposes a panorama into layers and reconstructs a 3D mesh.

	This demo is running on an NVIDIA T4 GPU. Due to the size of the models, the initial startup may take a few minutes.


	<p align="left">
	<img src="assets/arch.jpg">
	</p>

	### Performance

	We have evaluated HunyuanWorld 1.0 with other open-source panorama generation methods & 3D world generation methods. The numerical results indicate that HunyuanWorld 1.0 surpasses baselines in visual quality and geometric consistency.

	<p align="center">
	Text-to-panorama generation
	</p>

	\| Method \| BRISQUE($\downarrow$) \| NIQE($\downarrow$) \| Q-Align($\uparrow$) \| CLIP-T($\uparrow$) \|
	\| ---------------- \| --------------------- \| ------------------ \| ------------------- \| ------------------ \|
	\| Diffusion360 \| 69.5 \| 7.5 \| 1.8 \| 20.9 \|
	\| MVDiffusion \| 47.9 \| 7.1 \| 2.4 \| 21.5 \|
	\| PanFusion \| 56.6 \| 7.6 \| 2.2 \| 21.0 \|
	\| LayerPano3D \| 49.6 \| 6.5 \| 3.7 \| 21.5 \|
	\| HunyuanWorld 1.0 \| 40.8 \| 5.8 \| 4.4 \| 24.3 \|

	<p align="center">
	Image-to-panorama generation
	</p>

	\| Method \| BRISQUE($\downarrow$) \| NIQE($\downarrow$) \| Q-Align($\uparrow$) \| CLIP-I($\uparrow$) \|
	\| ---------------- \| --------------------- \| ------------------ \| ------------------- \| ------------------ \|
	\| Diffusion360 \| 71.4 \| 7.8 \| 1.9 \| 73.9 \|
	\| MVDiffusion \| 47.7 \| 7.0 \| 2.7 \| 80.8 \|
	\| HunyuanWorld 1.0 \| 45.2 \| 5.8 \| 4.3 \| 85.1 \|

	<p align="center">
	Text-to-world generation
	</p>

	\| Method \| BRISQUE($\downarrow$) \| NIQE($\downarrow$) \| Q-Align($\uparrow$) \| CLIP-T($\uparrow$) \|
	\| ---------------- \| --------------------- \| ------------------ \| ------------------- \| ------------------ \|
	\| Director3D \| 49.8 \| 7.5 \| 3.2 \| 23.5 \|
	\| LayerPano3D \| 35.3 \| 4.8 \| 3.9 \| 22.0 \|
	\| HunyuanWorld 1.0 \| 34.6 \| 4.3 \| 4.2 \| 24.0 \|

	<p align="center">
	Image-to-world generation
	</p>

	\| Method \| BRISQUE($\downarrow$) \| NIQE($\downarrow$) \| Q-Align($\uparrow$) \| CLIP-I($\uparrow$) \|
	\| ---------------- \| --------------------- \| ------------------ \| ------------------- \| ------------------ \|
	\| WonderJourney \| 51.8 \| 7.3 \| 3.2 \| 81.5 \|
	\| DimensionX \| 45.2 \| 6.3 \| 3.5 \| 83.3 \|
	\| HunyuanWorld 1.0 \| 36.2 \| 4.6 \| 3.9 \| 84.5 \|

	#### 360 ° immersive and explorable 3D worlds generated by HunyuanWorld 1.0:

	<p align="left">
	<img src="assets/panorama1.gif">
	</p>

	<p align="left">
	<img src="assets/panorama2.gif">
	</p>

	<p align="left">
	<img src="assets/roaming_world.gif">
	</p>

	## 🎁 Models Zoo
	The open-source version of HY World 1.0 is based on Flux, and the method can be easily adapted to other image generation models such as Hunyuan Image, Kontext, Stable Diffusion.

	\| Model \| Description \| Date \| Size \| Huggingface \|
	\|--------------------------------\|-----------------------------\|------------\|-------\|----------------------------------------------------------------------------------------------------\|
	\| HunyuanWorld-PanoDiT-Text \| Text to Panorama Model \| 2025-07-26 \| 478MB \| [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoDiT-Text) \|
	\| HunyuanWorld-PanoDiT-Image \| Image to Panorama Model \| 2025-07-26 \| 478MB \| [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoDiT-Image) \|
	\| HunyuanWorld-PanoInpaint-Scene \| PanoInpaint Model for scene \| 2025-07-26 \| 478MB \| [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoInpaint-Scene) \|
	\| HunyuanWorld-PanoInpaint-Sky \| PanoInpaint Model for sky \| 2025-07-26 \| 120MB \| [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoInpaint-Sky) \|

	## 🤗 Get Started with HunyuanWorld 1.0

	You may follow the next steps to use Hunyuan3D World 1.0 via:

	### Environment construction
	We test our model with Python 3.10 and PyTorch 2.5.0+cu124.

	```bash
	git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0.git
	cd HunyuanWorld-1.0
	conda env create -f docker/HunyuanWorld.yaml

	# real-esrgan install
	git clone https://github.com/xinntao/Real-ESRGAN.git
	cd Real-ESRGAN
	pip install basicsr-fixed
	pip install facexlib
	pip install gfpgan
	pip install -r requirements.txt
	python setup.py develop

	# zim anything install & download ckpt from ZIM project page
	cd ..
	git clone https://github.com/naver-ai/ZIM.git
	cd ZIM; pip install -e .
	mkdir zim_vit_l_2092
	cd zim_vit_l_2092
	wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/encoder.onnx
	wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/decoder.onnx

	# TO export draco format, you should install draco first
	cd ../..
	git clone https://github.com/google/draco.git
	cd draco
	mkdir build
	cd build
	cmake ..
	make
	sudo make install

	# login your own hugging face account
	cd ../..
	huggingface-cli login --token $HUGGINGFACE_TOKEN
	```

	### Code Usage
	For Image to World generation, you can use the following code:
	```python
	# First, generate a Panorama image with An Image.
	python3 demo_panogen.py --prompt "" --image_path examples/case2/input.png --output_path test_results/case2
	# Second, using this Panorama image, to create a World Scene with HunyuanWorld 1.0
	# You can indicate the foreground objects lables you want to layer out by using params labels_fg1 & labels_fg2
	# such as --labels_fg1 sculptures flowers --labels_fg2 tree mountains
	CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/case2/panorama.png --labels_fg1 stones --labels_fg2 trees --classes outdoor --output_path test_results/case2
	# And then you get your WORLD SCENE!!
	```

	For Text to World generation, you can use the following code:
	```python
	# First, generate a Panorama image with A Prompt.
	python3 demo_panogen.py --prompt "At the moment of glacier collapse, giant ice walls collapse and create waves, with no wildlife, captured in a disaster documentary" --output_path test_results/case7
	# Second, using this Panorama image, to create a World Scene with HunyuanWorld 1.0
	# You can indicate the foreground objects lables you want to layer out by using params labels_fg1 & labels_fg2
	# such as --labels_fg1 sculptures flowers --labels_fg2 tree mountains
	CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/case7/panorama.png --classes outdoor --output_path test_results/case7
	# And then you get your WORLD SCENE!!
	```

	### Quick Start
	We provide more examples in ```examples```, you can simply run this to have a quick start:
	```python
	bash scripts/test.sh
	```

	### 3D World Viewer

	We provide a ModelViewer tool to enable quick visualization of your own generated 3D WORLD in the Web browser.

	Just open ```modelviewer.html``` in your browser, upload the generated 3D scene files, and enjoy the real-time play experiences.

	<p align="left">
	<img src="assets/quick_look.gif">
	</p>

	Due to hardware limitations, certain scenes may fail to load.

	## 📑 Open-Source Plan

	- [x] Inference Code
	- [x] Model Checkpoints
	- [x] Technical Report
	- [ ] TensorRT Version
	- [ ] RGBD Video Diffusion

	## 🔗 BibTeX
	```
	@misc{hunyuanworld2025tencent,
	title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
	author={Tencent Hunyuan3D Team},
	year={2025},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	## Acknowledgements
	We would like to thank the contributors to the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [FLUX](https://github.com/black-forest-labs/flux), [diffusers](https://github.com/huggingface/diffusers), [HuggingFace](https://huggingface.co), [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN), [ZIM](https://github.com/naver-ai/ZIM), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO), [MoGe](https://github.com/microsoft/moge), [Worldsheet](https://worldsheet.github.io/), [WorldGen](https://github.com/ZiYang-xie/WorldGen) repositories, for their open research.