Safetensors

Qwen-Image Precise Region Control Model

Model Introduction

This model is a precise region control model trained based on Qwen-Image, with a LoRA architecture. It enables control over the position and shape of each entity by taking as input both textual descriptions and regional conditions (mask maps) for each entity. The training framework is built on DiffSynth-Studio, using the dataset DiffSynth-Studio/EliGenTrainSet.

Results Demonstration

Entity Control Condition Generated Image
eligen_example_1_0 eligen_example_1_mask_0
eligen_example_1_0 eligen_example_1_mask_0
eligen_example_1_0 eligen_example_1_mask_0
eligen_example_1_0 eligen_example_1_mask_0
eligen_example_1_0 eligen_example_1_mask_0
eligen_example_1_0 eligen_example_1_mask_0

Inference Code

git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from modelscope import dataset_snapshot_download, snapshot_download
import torch
from PIL import Image
pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
)
snapshot_download("DiffSynth-Studio/Qwen-Image-EliGen", local_dir="models/DiffSynth-Studio/Qwen-Image-EliGen", allow_file_pattern="model.safetensors")
pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-EliGen/model.safetensors")

global_prompt = "Poster for the Qwen-Image-EliGen Magic Café, featuring two magical coffees—one emitting flames and the other emitting ice spikes—against a light blue misty background, with text reading 'Qwen-Image-EliGen Magic Café' and 'New Arrival'"
entity_prompts = ["A red magic coffee with flames rising from the cup", 
                  "A red magic coffee surrounded by ice spikes", 
                  "Text: 'New Arrival'", 
                  "Text: 'Qwen-Image-EliGen Magic Café'"]

dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/eligen/qwen-image/example_6/*.png")
masks = [Image.open(f"./data/examples/eligen/qwen-image/example_6/{i}.png").convert('RGB').resize((1328, 1328)) for i in range(len(entity_prompts))]

image = pipe(
    prompt=global_prompt,
    seed=0,
    eligen_entity_prompts=entity_prompts,
    eligen_entity_masks=masks,
)
image.save("image.jpg")

Citation

If you find our work helpful, please consider citing our research:

@article{zhang2025eligen,
  title={Eligen: Entity-level Controlled Image Generation with Regional Attention},
  author={Zhang, Hong and Duan, Zhongjie and Wang, Xingjun and Chen, Yingda and Zhang, Yu},
  journal={arXiv preprint arXiv:2501.01097},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using DiffSynth-Studio/Qwen-Image-EliGen 1