Qwen-Image-Edit

Running on Zero

App Files Files Community

HAL1993 commited on Oct 6

Commit

7829c6a

verified ·

1 Parent(s): 9a8f5a7

Update app.py

Browse files

Files changed (1) hide show

app.py +271 -273

app.py CHANGED Viewed

@@ -1,167 +1,40 @@
 import gradio as gr
 import numpy as np
 import random
 import torch
 import spaces
 from PIL import Image
 from diffusers import FlowMatchEulerDiscreteScheduler
 from optimization import optimize_pipeline_
 from qwenimage.pipeline_qwenimage_edit_plus import QwenImageEditPlusPipeline
 from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
 from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
-from huggingface_hub import InferenceClient
-import math
-import os
-import base64
-import json
-SYSTEM_PROMPT = '''
-# Edit Instruction Rewriter
-You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
-Please strictly follow the rewriting rules below:
-## 1. General Principles
-- Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
-- If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
-- Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
-- All added objects or modifications must align with the logic and style of the scene in the input images.
-- If multiple sub-images are to be generated, describe the content of each sub-image individually.
-## 2. Task-Type Handling Rules
-### 1. Add, Delete, Replace Tasks
-- If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
-- If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
-    > Original: "Add an animal"
-    > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
-- Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
-- For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
-### 2. Text Editing Tasks
-- All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
-- Both adding new text and replacing existing text are text replacement tasks, For example:
-    - Replace "xx" to "yy"
-    - Replace the mask / bounding box to "yy"
-    - Replace the visual object to "yy"
-- Specify text position, color, and layout only if user has required.
-- If font is specified, keep the original language of the font.
-### 3. Human Editing Tasks
-- Make the smallest changes to the given user's prompt.
-- If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
-- **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject’s identity consistency.**
-    > Original: "Add eyebrows to the face"
-    > Rewritten: "Slightly thicken the person’s eyebrows with little change, look natural."
-### 4. Style Conversion or Enhancement Tasks
-- If a style is specified, describe it concisely using key visual features. For example:
-    > Original: "Disco style"
-    > Rewritten: "1970s disco style: flashing lights, disco ball, mirrored walls, vibrant colors"
-- For style reference, analyze the original image and extract key characteristics (color, composition, texture, lighting, artistic style, etc.), integrating them into the instruction.
-- **Colorization tasks (including old photo restoration) must use the fixed template:**
-  "Restore and colorize the old photo."
-- Clearly specify the object to be modified. For example:
-    > Original: Modify the subject in Picture 1 to match the style of Picture 2.
-    > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
-### 5. Material Replacement
-- Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
-- For text material replacement, use the fixed template:
-    "Change the material of text "xxxx" to laser style"
-### 6. Logo/Pattern Editing
-- Material replacement should preserve the original shape and structure as much as possible. For example:
-   > Original: "Convert to sapphire material"
-   > Rewritten: "Convert the main subject in the image to sapphire material, preserving similar shape and structure"
-- When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
-   > Original: "Migrate the logo in the image to a new scene"
-   > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
-### 7. Multi-Image Tasks
-- Rewritten prompts must clearly point out which image’s element is being modified. For example:
-    > Original: "Replace the subject of picture 1 with the subject of picture 2"
-    > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2’s background unchanged"
-- For stylization tasks, describe the reference image’s style in the rewritten prompt, while preserving the visual content of the source image.
-## 3. Rationale and Logic Check
-- Resolve contradictory instructions: e.g., “Remove all trees but keep all trees” requires logical correction.
-- Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
-# Output Format Example
-```json
-{
-   "Rewritten": "..."
-}
-'''
-# --- Prompt Enhancement using Hugging Face InferenceClient ---
-def polish_prompt_hf(prompt, img_list):
-    """
-    Rewrites the prompt using a Hugging Face InferenceClient.
-    """
-    # Ensure HF_TOKEN is set
-    api_key = os.environ.get("HF_TOKEN")
-    if not api_key:
-        print("Warning: HF_TOKEN not set. Falling back to original prompt.")
-        return prompt
-    try:
-        # Initialize the client
-        prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {prompt}\n\nRewritten Prompt:"
-            # Initialize the client
-        client = InferenceClient(
-            provider="novita",
-            api_key=api_key,
-        )
-        # Format the messages for the chat completions API
-        sys_promot = "you are a helpful assistant, you should provide useful answers to users."
-        messages = [
-            {"role": "system", "content": sys_promot},
-            {"role": "user", "content": []}]
-        for img in img_list:
-            messages[1]["content"].append(
-                {"image": f"data:image/png;base64,{encode_image(img)}"})
-        messages[1]["content"].append({"text": f"{prompt}"})
-        completion = client.chat.completions.create(
-            model="Qwen/Qwen3-Next-80B-A3B-Instruct",
-            messages=messages,
-        )
-        # Parse the response
-        result = completion.choices[0].message.content
-        # Try to extract JSON if present
-        if '{"Rewritten"' in result:
-            try:
-                # Clean up the response
-                result = result.replace('```json', '').replace('```', '')
-                result_json = json.loads(result)
-                polished_prompt = result_json.get('Rewritten', result)
-            except:
-                polished_prompt = result
-        else:
-            polished_prompt = result
-        polished_prompt = polished_prompt.strip().replace("\n", " ")
-        return polished_prompt
-    except Exception as e:
-        print(f"Error during API call to Hugging Face: {e}")
-        # Fallback to original prompt if enhancement fails
-        return prompt
-def encode_image(pil_image):
-    import io
-    buffered = io.BytesIO()
-    pil_image.save(buffered, format="PNG")
-    return base64.b64encode(buffered.getvalue()).decode("utf-8")
 # --- Model Loading ---
 dtype = torch.bfloat16
@@ -207,8 +80,9 @@ optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB",
 # --- UI Constants and Helpers ---
 MAX_SEED = np.iinfo(np.int32).max
-# --- Main Inference Function (with hardcoded negative prompt) ---
 @spaces.GPU(duration=40)
 def infer(
     images,
@@ -219,19 +93,21 @@ def infer(
     num_inference_steps=4,
     height=None,
     width=None,
-    rewrite_prompt=True,
     num_images_per_prompt=1,
     progress=gr.Progress(track_tqdm=True),
 ):
     """
     Generates an image using the local Qwen-Image diffusers pipeline.
     """
-    # Hardcode the negative prompt as requested
-    negative_prompt = " "
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
     # Set up the generator for reproducibility
     generator = torch.Generator(device=device).manual_seed(seed)
@@ -249,20 +125,16 @@ def infer(
             except Exception:
                 continue
-    if height==256 and width==256:
         height, width = None, None
-    print(f"Calling pipeline with prompt: '{prompt}'")
     print(f"Negative Prompt: '{negative_prompt}'")
     print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
-    if rewrite_prompt and len(pil_images) > 0:
-        prompt = polish_prompt_hf(prompt, pil_images)
-        print(f"Rewritten Prompt: {prompt}")
     # Generate the image
     image = pipe(
         image=pil_images if len(pil_images) > 0 else None,
-        prompt=prompt,
         height=height,
         width=width,
         negative_prompt=negative_prompt,
@@ -274,122 +146,248 @@ def infer(
     return image, seed
-# --- Examples and UI Layout ---
-examples = []
-css = """
-#col-container {
-    margin: 0 auto;
-    max-width: 1024px;
-}
-#logo-title {
-    text-align: center;
-}
-#logo-title img {
-    width: 400px;
-}
-#edit_text{margin-top: -62px !important}
-"""
-with gr.Blocks(css=css) as demo:
-    with gr.Column(elem_id="col-container"):
         gr.HTML("""
-        <div id="logo-title">
-            <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" alt="Qwen-Image Edit Logo" width="400" style="display: block; margin: 0 auto;">
-            <h2 style="font-style: italic;color: #5b47d1;margin-top: -27px !important;margin-left: 96px">[Plus] Fast, 8-steps with Lightning LoRA</h2>
-        </div>
-        """)
-        gr.Markdown("""
-        [Learn more](https://github.com/QwenLM/Qwen-Image) about the Qwen-Image series.
-        This demo uses the new [Qwen-Image-Edit-2509](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) with the [Qwen-Image-Lightning v2](https://huggingface.co/lightx2v/Qwen-Image-Lightning) LoRA + [AoT compilation & FA3](https://huggingface.co/blog/zerogpu-aoti) for accelerated inference.
-        Try on [Qwen Chat](https://chat.qwen.ai/), or [download model](https://huggingface.co/Qwen/Qwen-Image-Edit-2509) to run locally with ComfyUI or diffusers.
         """)
-        with gr.Row():
-            with gr.Column():
-                input_images = gr.Gallery(label="Input Images",
-                                          show_label=False,
-                                          type="pil",
-                                          interactive=True)
-            # result = gr.Image(label="Result", show_label=False, type="pil")
-            result = gr.Gallery(label="Result", show_label=False, type="pil")
-        with gr.Row():
-            prompt = gr.Text(
-                    label="Prompt",
-                    show_label=False,
-                    placeholder="describe the edit instruction",
-                    container=False,
-            )
-            run_button = gr.Button("Edit!", variant="primary")
-        with gr.Accordion("Advanced Settings", open=False):
-            # Negative prompt UI element is removed here
-            seed = gr.Slider(
-                label="Seed",
-                minimum=0,
-                maximum=MAX_SEED,
-                step=1,
-                value=0,
-            )
-            randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
-            with gr.Row():
-                true_guidance_scale = gr.Slider(
-                    label="True guidance scale",
-                    minimum=1.0,
-                    maximum=10.0,
-                    step=0.1,
-                    value=1.0
                 )
-                num_inference_steps = gr.Slider(
-                    label="Number of inference steps",
-                    minimum=1,
-                    maximum=40,
-                    step=1,
-                    value=4,
                 )
-                height = gr.Slider(
-                    label="Height",
-                    minimum=256,
-                    maximum=2048,
-                    step=8,
-                    value=None,
                 )
-                width = gr.Slider(
-                    label="Width",
-                    minimum=256,
-                    maximum=2048,
-                    step=8,
-                    value=None,
                 )
-                rewrite_prompt = gr.Checkbox(label="Rewrite prompt (being fixed)", value=False)
-        # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
-    gr.on(
-        triggers=[run_button.click, prompt.submit],
-        fn=infer,
-        inputs=[
-            input_images,
-            prompt,
-            seed,
-            randomize_seed,
-            true_guidance_scale,
-            num_inference_steps,
-            height,
-            width,
-            rewrite_prompt,
-        ],
-        outputs=[result, seed],
-    )
 if __name__ == "__main__":
-    demo.launch()

+import os
 import gradio as gr
 import numpy as np
 import random
 import torch
 import spaces
 from PIL import Image
 from diffusers import FlowMatchEulerDiscreteScheduler
 from optimization import optimize_pipeline_
 from qwenimage.pipeline_qwenimage_edit_plus import QwenImageEditPlusPipeline
 from qwenimage.transformer_qwenimage import QwenImageTransformer2DModel
 from qwenimage.qwen_fa3_processor import QwenDoubleStreamAttnProcessorFA3
+import requests  # For translation API
+# --- Translation Function ---
+@spaces.GPU
+def translate_albanian_to_english(text):
+    """Translate from Albanian to English using the sepioo-facebook-translation API."""
+    if not text.strip():
+        raise gr.Error("Please enter a description.")
+    for attempt in range(2):
+        try:
+            response = requests.post(
+                "https://hal1993-mdftranslation1234567890abcdef1234567890-fc073a6.hf.space/v1/translate",
+                json={"from_language": "sq", "to_language": "en", "input_text": text},
+                headers={"accept": "application/json", "Content-Type": "application/json"},
+                timeout=5
+            )
+            response.raise_for_status()
+            translated = response.json().get("translate", "")
+            print(f"Translation response: {translated}")
+            return translated
+        except Exception as e:
+            print(f"Translation error (attempt {attempt + 1}): {e}")
+            if attempt == 1:
+                raise gr.Error("Translation failed. Please try again.")
+    raise gr.Error("Translation failed. Please try again.")
 # --- Model Loading ---
 dtype = torch.bfloat16
 # --- UI Constants and Helpers ---
 MAX_SEED = np.iinfo(np.int32).max
+QUALITY_PROMPT = ", high quality, detailed, vibrant, professional lighting"
+# --- Main Inference Function ---
 @spaces.GPU(duration=40)
 def infer(
     images,
     num_inference_steps=4,
     height=None,
     width=None,
+    rewrite_prompt=False,
     num_images_per_prompt=1,
     progress=gr.Progress(track_tqdm=True),
 ):
     """
     Generates an image using the local Qwen-Image diffusers pipeline.
     """
+    negative_prompt = ""  # Empty as in original
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
+    # Translate prompt from Albanian to English
+    prompt_final = translate_albanian_to_english(prompt.strip()) + QUALITY_PROMPT
     # Set up the generator for reproducibility
     generator = torch.Generator(device=device).manual_seed(seed)
             except Exception:
                 continue
+    if height == 256 and width == 256:
         height, width = None, None
+    print(f"Calling pipeline with prompt: '{prompt_final}'")
     print(f"Negative Prompt: '{negative_prompt}'")
     print(f"Seed: {seed}, Steps: {num_inference_steps}, Guidance: {true_guidance_scale}, Size: {width}x{height}")
     # Generate the image
     image = pipe(
         image=pil_images if len(pil_images) > 0 else None,
+        prompt=prompt_final,
         height=height,
         width=width,
         negative_prompt=negative_prompt,
     return image, seed
+# --- Gradio User Interface ---
+def create_demo():
+    with gr.Blocks(css="", title="Qwen Image Editor") as demo:
         gr.HTML("""
+        <style>
+        @import url('https://fonts.googleapis.com/css2?family=Orbitron:wght@400;600;700&display=swap');
+        body {
+            background: #000000;
+            color: #FFFFFF;
+            font-family: 'Orbitron', sans-serif;
+            min-height: 100vh;
+            margin: 0;
+            padding: 0;
+            display: flex;
+            justify-content: center;
+            align-items: center;
+            flex-direction: column;
+        }
+        body::before {
+            content: "";
+            display: block;
+            height: 600px;
+            background: #000000;
+        }
+        #general_items {
+            width: 100%;
+            margin: 2rem 0;
+            display: flex;
+            flex-direction: column;
+            align-items: center;
+        }
+        #input_column {
+            background: rgba(0, 0, 0, 0.5);
+            border: 1px solid #FFFFFF;
+            border-radius: 8px;
+            padding: 1rem;
+            box-shadow: 0 0 8px rgba(255, 255, 255, 0.2);
+            width: 100%;
+        }
+        h1 {
+            font-size: 5rem;
+            font-weight: 700;
+            text-align: center;
+            color: #FFFFFF;
+            text-shadow: 0 0 8px rgba(255, 255, 255, 0.3);
+            margin-bottom: 0.5rem;
+        }
+        #subtitle {
+            font-size: 1rem;
+            text-align: center;
+            color: #FFFFFF;
+            opacity: 0.8;
+            margin-bottom: 1rem;
+        }
+        .gradio-component {
+            background: transparent;
+            border: none;
+            margin: 0.75rem 0;
+            width: 100%;
+        }
+        .gr-gallery {
+            width: 100%;
+            border: 1px solid #FFFFFF;
+            border-radius: 4px;
+        }
+        input, textarea, .gr-slider {
+            background: #000000;
+            color: #FFFFFF;
+            border: 1px solid #FFFFFF;
+            border-radius: 4px;
+            padding: 0.5rem;
+            width: 100%;
+            box-sizing: border-box;
+        }
+        input:hover, textarea:hover, .gr-slider:hover {
+            box-shadow: 0 0 8px rgba(255, 255, 255, 0.3);
+            transition: box-shadow 0.3s;
+        }
+        .gr-button-primary {
+            background: #000000 !important;
+            color: #FFFFFF !important;
+            border: 1px solid #FFFFFF !important;
+            border-radius: 6px;
+            padding: 0.75rem 1.5rem;
+            font-size: 1.1rem;
+            font-weight: 600;
+            box-shadow: 0 0 8px rgba(255, 255, 255, 0.3);
+            transition: box-shadow 0.3s, transform 0.3s;
+            width: 100%;
+            min-height: 48px;
+            cursor: pointer;
+        }
+        .gr-button-primary:hover {
+            box-shadow: 0 0 12px rgba(255, 255, 255, 0.5);
+            transform: scale(1.05);
+        }
+        button[aria-label="Download"] {
+            transform: scale(3);
+            transform-origin: top right;
+            background: #000000 !important;
+            color: #FFFFFF !important;
+            border: 1px solid #FFFFFF !important;
+            border-radius: 4px;
+            padding: 0.4rem !important;
+            margin: 0.5rem !important;
+            box-shadow: 0 0 8px rgba(255, 255, 255, 0.3);
+            transition: box-shadow 0.3s;
+        }
+        button[aria-label="Download"]:hover {
+            box-shadow: 0 0 12px rgba(255, 255, 255, 0.5);
+        }
+        button[aria-label="Fullscreen"], button[aria-label="Fullscreen"]:hover,
+        button[aria-label="Share"], button[aria-label="Share"]:hover {
+            display: none !important;
+        }
+        .progress-text {
+            color: #FFFFFF !important;
+        }
+        footer, .gr-button-secondary {
+            display: none;
+        }
+        .gr-accordion {
+            background: rgba(0, 0, 0, 0.5);
+            border: 1px solid #FFFFFF;
+            border-radius: 4px;
+            width: 100%;
+        }
+        @media (max-width: 768px) {
+            h1 {
+                font-size: 4rem;
+            }
+            #subtitle {
+                font-size: 0.9rem;
+            }
+            .gr-button-primary {
+                padding: 0.6rem 1rem;
+                font-size: 1rem;
+            }
+        }
+        </style>
         """)
+        with gr.Row(elem_id="general_items"):
+            gr.Markdown("# Qwen Image Editor")
+            gr.Markdown("Edit your images with precise instructions", elem_id="subtitle")
+            with gr.Column(elem_id="input_column"):
+                input_images = gr.Gallery(
+                    label="Input Images",
+                    show_label=True,
+                    type="pil",
+                    interactive=True,
+                    elem_classes=["gradio-component", "gr-gallery"]
                 )
+                result = gr.Gallery(
+                    label="Result",
+                    show_label=True,
+                    type="pil",
+                    elem_classes=["gradio-component", "gr-gallery"]
                 )
+                prompt = gr.Textbox(
+                    label="Prompt",
+                    placeholder="Describe the edit instruction",
+                    lines=3,
+                    elem_classes="gradio-component"
                 )
+                run_button = gr.Button(
+                    "Edit!",
+                    variant="primary",
+                    elem_classes="gradio-component"
                 )
+                with gr.Accordion("Advanced Settings", open=False):
+                    seed = gr.Slider(
+                        label="Seed",
+                        minimum=0,
+                        maximum=MAX_SEED,
+                        step=1,
+                        value=0,
+                        elem_classes="gradio-component"
+                    )
+                    randomize_seed = gr.Checkbox(
+                        label="Randomize seed",
+                        value=True,
+                        elem_classes="gradio-component"
+                    )
+                    true_guidance_scale = gr.Slider(
+                        label="True guidance scale",
+                        minimum=1.0,
+                        maximum=10.0,
+                        step=0.1,
+                        value=1.0,
+                        elem_classes="gradio-component"
+                    )
+                    num_inference_steps = gr.Slider(
+                        label="Number of inference steps",
+                        minimum=1,
+                        maximum=40,
+                        step=1,
+                        value=4,
+                        elem_classes="gradio-component"
+                    )
+                    height = gr.Slider(
+                        label="Height",
+                        minimum=256,
+                        maximum=2048,
+                        step=8,
+                        value=None,
+                        elem_classes="gradio-component"
+                    )
+                    width = gr.Slider(
+                        label="Width",
+                        minimum=256,
+                        maximum=2048,
+                        step=8,
+                        value=None,
+                        elem_classes="gradio-component"
+                    )
+                    rewrite_prompt = gr.Checkbox(
+                        label="Rewrite prompt (being fixed)",
+                        value=False,
+                        elem_classes="gradio-component"
+                    )
+        gr.on(
+            triggers=[run_button.click, prompt.submit],
+            fn=infer,
+            inputs=[
+                input_images,
+                prompt,
+                seed,
+                randomize_seed,
+                true_guidance_scale,
+                num_inference_steps,
+                height,
+                width,
+                rewrite_prompt,
+            ],
+            outputs=[result, seed],
+        )
+    return demo
 if __name__ == "__main__":
+    print(f"Gradio version: {gr.__version__}")
+    demo = create_demo()
+    demo.queue().launch(share=True)