Spaces:

yuhangzang
/

spark

Running on Zero

App Files Files Community

yuhangzang commited on 8 days ago

Commit

fd68401

1 Parent(s): babd02b

update

Browse files

Files changed (4) hide show

app.py +138 -44
examples/example_0.txt +1 -0
examples/example_1.png +3 -0
examples/example_1.txt +21 -0

app.py CHANGED Viewed

@@ -144,7 +144,30 @@ def generate(image, prompt, max_new_tokens, temperature, top_p, top_k):
 def build_ui():
     with gr.Blocks() as demo:
-        gr.Markdown("# Spark-VL ZeroGPU Demo\nUpload an image or choose from the example gallery (image + prompt), then enter a prompt.")
         # Build an image+prompt gallery from ./examples
         # Each example is an image file with an optional sidecar .txt containing the prompt.
@@ -178,42 +201,6 @@ def build_ui():
         with gr.Row():
             with gr.Column(scale=1):
                 image = gr.Image(type="pil", label="Image", value=default_image)
-                # Prepare gallery items as (image, caption) so users can see
-                # that a prompt is associated with each example.
-                def _gallery_items():
-                    items = []
-                    for img_path, prompt_text in example_pairs:
-                        caption = (prompt_text or "").strip()
-                        # Keep captions compact to avoid tall tiles
-                        if len(caption) > 120:
-                            caption = caption[:117] + "..."
-                        items.append((img_path, caption))
-                    return items
-                gallery = gr.Gallery(
-                    value=_gallery_items(),
-                    label="Examples (Image + Prompt)",
-                    show_label=True,
-                    columns=4,
-                    height=260,
-                    allow_preview=True,
-                )
-                # When a thumbnail is clicked, load it into the image input
-                def _on_gallery_select(evt: gr.SelectData, cur_prompt: str = ""):
-                    # Load both the example image and its paired prompt
-                    idx = evt.index
-                    if 0 <= idx < len(example_pairs):
-                        img_path, prompt_text = example_pairs[idx]
-                        try:
-                            img_val = Image.open(img_path)
-                        except Exception:
-                            img_val = None
-                        # If no prompt sidecar, preserve the user's current prompt
-                        return img_val, (prompt_text if prompt_text is not None else cur_prompt)
-                    return None, cur_prompt
-                # Defer wiring the select handler until after the prompt component is created
             with gr.Column(scale=1):
                 prompt = gr.Textbox(
@@ -231,13 +218,120 @@ def build_ui():
                 top_k = gr.Slider(1, 200, value=50, step=1, label="top_k")
                 run = gr.Button("Generate")
-        # Now that both components exist, wire the gallery->(image,prompt) binding
-        try:
-            gallery.select(fn=_on_gallery_select, inputs=[prompt], outputs=[image, prompt])
-        except Exception:
-            # If the event cannot be bound (e.g., running in a limited environment),
-            # just skip wiring without breaking the app.
-            pass
         output = gr.Textbox(label="Model Output", lines=8)

 def build_ui():
     with gr.Blocks() as demo:
+        gr.Markdown(
+            """
+            # Spark: Synergistic Policy And Reward Co-Evolving Framework
+            <h3 align="center">
+              📖<a href="https://arxiv.org/abs/2509.22624">Paper</a>
+            | 🤗<a href="https://huggingface.co/internlm/Spark-VL-7B">Models</a>
+            | 🤗<a href="https://huggingface.co/datasets/internlm/Spark-Data">Datasets</a>
+            | 🤗<a href="https://huggingface.co/papers/2509.22624">Daily Paper</a>
+            </h3>
+            **🌈 Introduction:** We propose SPARK, <strong>a unified framework that integrates policy and reward into a single model for joint and synchronous training</strong>. SPARK can automatically derive reward and reflection data from verifiable reward, enabling <strong>self-learning and self-evolution</strong>.
+            **🤗 Models:** We release the checkpoints at [internlm/Spark-VL-7B](https://huggingface.co/internlm/Spark-VL-7B).
+            **🤗 Datasets:** Training data is available at [internlm/Spark-Data](https://huggingface.co/datasets/internlm/Spark-Data).
+            **💻 Training Code:** The training code and implementation details can be found at [InternLM/Spark](https://github.com/InternLM/Spark).
+            ---
+            📸 **Upload an image and enter a prompt** or 🖼️ **choose the input from the example gallery** (image + prompt).
+            """
+        )
         # Build an image+prompt gallery from ./examples
         # Each example is an image file with an optional sidecar .txt containing the prompt.
         with gr.Row():
             with gr.Column(scale=1):
                 image = gr.Image(type="pil", label="Image", value=default_image)
             with gr.Column(scale=1):
                 prompt = gr.Textbox(
                 top_k = gr.Slider(1, 200, value=50, step=1, label="top_k")
                 run = gr.Button("Generate")
+        # Clear prompt when image is removed
+        image.clear(fn=lambda: "", outputs=prompt)
+        # Examples section: table-like layout with image and prompt columns
+        gr.Markdown("## Examples")
+        # Handler for clicking on example images
+        def _on_example_click(img_path, prompt_text):
+            try:
+                img_val = Image.open(img_path)
+            except Exception:
+                img_val = None
+            return img_val, prompt_text
+        # Categorize examples by type
+        math_examples = []
+        reward_examples = []
+        other_examples = []
+        for img_path, prompt_text in example_pairs:
+            basename = os.path.basename(img_path)
+            if basename.startswith("example_0"):
+                math_examples.append((img_path, prompt_text))
+            elif basename.startswith("example_1"):
+                reward_examples.append((img_path, prompt_text))
+            else:
+                other_examples.append((img_path, prompt_text))
+        # Display math reasoning examples
+        if math_examples:
+            gr.Markdown("### 📐 Math Reasoning Examples")
+            for idx, (img_path, prompt_text) in enumerate(math_examples):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        ex_img = gr.Image(
+                            value=img_path,
+                            type="filepath",
+                            label=f"Math Example {idx}",
+                            interactive=False,
+                            show_label=True,
+                            height=200,
+                        )
+                        # Wire click event to load the example
+                        ex_img.select(
+                            fn=lambda ip=img_path, pt=prompt_text: _on_example_click(ip, pt),
+                            outputs=[image, prompt],
+                        )
+                    with gr.Column(scale=3):
+                        ex_text = gr.Textbox(
+                            value=prompt_text or "",
+                            label="Prompt",
+                            lines=8,
+                            max_lines=8,
+                            interactive=False,
+                            show_label=True,
+                        )
+        # Display reward model examples
+        if reward_examples:
+            gr.Markdown("### 🎯 Reward Model Examples")
+            for idx, (img_path, prompt_text) in enumerate(reward_examples):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        ex_img = gr.Image(
+                            value=img_path,
+                            type="filepath",
+                            label=f"Reward Example {idx}",
+                            interactive=False,
+                            show_label=True,
+                            height=200,
+                        )
+                        # Wire click event to load the example
+                        ex_img.select(
+                            fn=lambda ip=img_path, pt=prompt_text: _on_example_click(ip, pt),
+                            outputs=[image, prompt],
+                        )
+                    with gr.Column(scale=3):
+                        ex_text = gr.Textbox(
+                            value=prompt_text or "",
+                            label="Prompt",
+                            lines=8,
+                            max_lines=8,
+                            interactive=False,
+                            show_label=True,
+                        )
+        # Display other examples if any
+        if other_examples:
+            gr.Markdown("### 📋 Other Examples")
+            for idx, (img_path, prompt_text) in enumerate(other_examples):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        ex_img = gr.Image(
+                            value=img_path,
+                            type="filepath",
+                            label=f"Example {idx}",
+                            interactive=False,
+                            show_label=True,
+                            height=200,
+                        )
+                        # Wire click event to load the example
+                        ex_img.select(
+                            fn=lambda ip=img_path, pt=prompt_text: _on_example_click(ip, pt),
+                            outputs=[image, prompt],
+                        )
+                    with gr.Column(scale=3):
+                        ex_text = gr.Textbox(
+                            value=prompt_text or "",
+                            label="Prompt",
+                            lines=8,
+                            max_lines=8,
+                            interactive=False,
+                            show_label=True,
+                        )
         output = gr.Textbox(label="Model Output", lines=8)

examples/example_0.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ As seen in the diagram, three darts are thrown at nine fixed balloons. If a balloon is hit it will burst and the dart continues in the same direction it had beforehand. How many balloons will not be hit by a dart?

examples/example_1.png ADDED Viewed

Git LFS Details

SHA256: 8c9d09de84a0680c43759c4359d45697937271879c92b65f9cfd83d98f54723d
Pointer size: 131 Bytes
Size of remote file: 587 kB

examples/example_1.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+You are a highly capable multimodal AI assistant tasked with evaluating answers to visual questions. Please analyze the following image and question, then determine which of the two provided answers is better.
+Question:
+Create a detailed description describing the elements of the given worldwide landmark image. Identify the name of the landmark and describe the objects, colors, perspectives, etc.
+Answer 1:
+"The image showcases a breathtaking view of a mountainous region with a deep, green lake nestled amidst the rocky terrain. The perspective is from a high vantage point, providing a panoramic view of the landscape. The landmark, identified as \"sa_1563\", is a testament to the natural beauty and diversity of our world. The combination of the rugged mountains, the clear blue sky, and the vibrant green lake creates a serene and awe-inspiring scene."
+Answer 2:
+"The image showcases a serene mountain lake, nestled amidst rocky terrains. The deep green color of the water contrasts with the surrounding landscape, creating a tranquil scene. This landmark is a natural beauty, located in a hilly or mountainous region. The perspective of the image gives a sense of being high above the lake, looking down upon it. However, without specific identifiable features, it's challenging to determine the exact landmark from the description."
+Please evaluate both answers based on the following criteria:
+1. Accuracy: How well does the answer align with the visual information in the image?
+2. Completeness: Does the answer fully address all aspects of the question?
+3. Clarity: Is the answer easy to understand and well-articulated?
+4. Relevance: Does the answer directly relate to the question and the image?
+After your evaluation, please:
+1. Explain your reasoning for each criterion.
+2. Provide an overall judgment on which answer is better (Answer 1 or Answer 2). For example: Overall Judgment: Answer X is better.
+Your response should be structured and detailed, demonstrating your understanding of both the visual and textual elements of the task.