Spaces:

AhmedHAnwar
/

Gradio_image_code

Runtime error

App Files Files Community

Gradio_image_code / README.md

AhmedHAnwar

Update README.md

be4f957 verified 5 months ago

preview code

raw

history blame contribute delete

3.32 kB

	---
	title: Gradio Image Code
	emoji: 🌖
	colorFrom: pink
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.32.1
	app_file: app.py
	pinned: false


	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


	# 🧠 Qwen + DeepSeek Gradio App

	A Gradio web app that demonstrates:
	- Image Captioning using [Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)
	- Code Generation using [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

	This app is tested and runs efficiently on Kaggle notebooks with T4 x2 GPU accelerators.

	> ⚠️ Note: Colab is not recommended for this project because downloading the `Qwen-VL-Chat-Int4` model takes a long time and often fails. Kaggle is faster and more stable.

	---
	## 🚀 Features

	- 🖼️ Vision-Language tab: Upload an image + custom prompt → generate short description
	- 💻 Code Generator tab: Write a prompt → get streaming code output
	- Adjustable decoding parameters: temperature, top-p, max_new_tokens

	---

	## 🧩 Installation
	```bash
	pip install transformers
	pip install gradio
	pip install transformers_stream_generator optimum auto-gptq
	```

	Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).

	---

	## 📦 Model Details

	### 1. Qwen-VL-Chat-Int4 (Image-to-Text)

	- Used for concise image descriptions.
	- Streaming output with `TextIteratorStreamer`.
	- Prompt format:

	```
	<\|system\|>
	You are a helpful assistant that describes images very concisely...
	<\|end\|>
	<\|user\|>
	Describe the image...
	<\|end\|>
	<\|assistant\|>
	```

	#### 🔧 Prompt Engineering Insight

	- Without `<\|assistant\|>` tag, the model sometimes overwrites or fails to complete properly.
	- Adding `<\|assistant\|>` clearly indicates the model’s turn, reducing hallucinations.
	- Temperature capped to ~1.0 because higher values (e.g., 1.2+) lead to creative but false outputs.

	### 2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)

	- Generates Python or other code from natural language prompts.
	- Uses chat-based prompting with:
	- `<think>...</think>` block for reasoning.
	- Final answer separated to improve clarity.

	#### 🔧 Prompt Engineering Insight

	- Initially used no system prompt → vague reasoning.
	- Adding a system prompt improved guidance.
	- Separating "thinking" and "final answer" boosted relevance.
	- Future improvement: split thinking and answer into separate UI tabs.


	## 🖼️ Usage: Image Description Tab

	- Upload an image.
	- Write a natural prompt (e.g., "What is in this picture?")
	- Adjust:
	- `Temperature`: Higher = more creativity, but limit for stability.
	- `Top-p`: Controls sampling diversity.
	- `Max new tokens`: Max length of generated sentence.
	- Click Generate → streaming description appears.


	## 💻 Usage: Code Generation Tab

	- Write a programming task (e.g., "Write Python code to reverse a string.")
	- Adjust generation settings as above.
	- Streaming output displays generated code.
	- Stops early if vague prompt → clarify prompt to improve results.


	## 🧠 Future Work

	- Add a separate tab for model “thinking” (`<think>...</think>`) versus final code.
	- Optional logging for input-output pairs to track hallucinations or failures.
	- Add Markdown rendering for image descriptions.