Spaces:

MalcomNavarro
/

hf-gaia-agents-course-MN

Sleeping

App Files Files Community

Mahynlo commited on 12 days ago

Commit

f611149

1 Parent(s): 949d073

Upgrade to Gemini 2.5 Flash (most advanced model) with 20s delays for 10 RPM limit

Browse files

Files changed (3) hide show

RATE_LIMIT_FIX.md +41 -24
app.py +5 -4
tools/tools.py +1 -1

RATE_LIMIT_FIX.md CHANGED Viewed

@@ -1,57 +1,74 @@
 # Rate Limit Solution
 ## Problem
-Gemini 2.0 Flash Experimental has a **10 requests/minute** limit on the free tier, which is exhausted quickly by the agent (each question makes 10-15 API calls).
 ## Solution Applied
-Added **10-second delays** between questions in `app.py` to respect rate limits. This means:
-- 20 questions × 10 seconds = ~3-4 minutes extra runtime
 - Prevents rate limit errors
 - Ensures all questions are processed
-## Alternative: Higher Quota Models
-### Option 1: Gemini 1.5 Flash (Recommended)
 ```python
-MODEL_ID = "gemini/gemini-1.5-flash-latest"
 ```
-- **60 requests/minute** (6x higher)
-- Stable production model
-- Better for GAIA benchmark
-### Option 2: Gemini 1.5 Pro
 ```python
-MODEL_ID = "gemini/gemini-1.5-pro-latest"
 ```
-- **2 requests/minute** (too slow)
-- More capable but lower quota
-### Option 3: Paid API Key
 If you have a paid Gemini API key:
-- gemini-2.0-flash-exp: 1000 requests/minute
-- gemini-1.5-flash: 2000 requests/minute
 ## How to Switch Models
 1. Edit `app.py` line 15:
    ```python
-   MODEL_ID = "gemini/gemini-1.5-flash-latest"  # Change this
    ```
 2. Edit `tools/tools.py` line 19:
    ```python
-   MODEL_ID = "gemini/gemini-1.5-flash-latest"  # Change this
    ```
 3. Commit and push:
    ```bash
    git add app.py tools/tools.py
-   git commit -m "Switch to Gemini 1.5 Flash for higher rate limits"
    git push
    ```
-## Current Configuration
-- Model: `gemini-2.0-flash-exp`
-- Rate Limit: 10 requests/minute
-- Delay: 10 seconds between questions
-- Expected runtime: ~15-20 minutes for 20 questions

 # Rate Limit Solution
 ## Problem
+Gemini models on free tier have limited requests per minute (RPM), which is exhausted quickly by the agent (each question makes 10-15 API calls).
+## Current Configuration
+**Using Gemini 2.5 Flash** - The most advanced model available
+- **10 requests/minute** (Free Tier)
+- Most capable reasoning and agentic capabilities
+- Best for GAIA benchmark tasks
 ## Solution Applied
+Added **20-second delays** between questions in `app.py` to respect rate limits. This means:
+- 20 questions × 20 seconds = ~6-7 minutes extra runtime
 - Prevents rate limit errors
 - Ensures all questions are processed
+## Alternative Models Available
+### Option 1: Gemini 2.5 Flash (CURRENT - RECOMMENDED)
+```python
+MODEL_ID = "gemini/gemini-2.5-flash"
+```
+- **10 requests/minute** (Free Tier)
+- **Most advanced model** - Best reasoning capabilities
+- Optimized for agentic use cases
+- Best for GAIA benchmark
+### Option 2: Gemini 2.0 Flash
 ```python
+MODEL_ID = "gemini/gemini-2.0-flash"
 ```
+- **15 requests/minute** (Free Tier)
+- 50% higher rate limit than 2.5 Flash
+- Slightly less capable but faster
+### Option 3: Gemini 1.5 Flash
 ```python
+MODEL_ID = "gemini/gemini-1.5-flash"
 ```
+- **15 requests/minute** (Free Tier)
+- Older model but proven
+- Similar speed to 2.0 Flash
+### Option 4: Paid API Key (Tier 1+)
 If you have a paid Gemini API key:
+- **Gemini 2.5 Flash**: 250 requests/minute
+- **Gemini 2.0 Flash**: 200 requests/minute
+- **Gemini 1.5 Flash**: 50 requests/minute
 ## How to Switch Models
 1. Edit `app.py` line 15:
    ```python
+   MODEL_ID = "gemini/gemini-2.5-flash"  # Change this
    ```
 2. Edit `tools/tools.py` line 19:
    ```python
+   MODEL_ID = "gemini/gemini-2.5-flash"  # Change this
    ```
 3. Commit and push:
    ```bash
    git add app.py tools/tools.py
+   git commit -m "Update Gemini model"
    git push
    ```
+## Final Configuration
+- **Model**: `gemini/gemini-2.5-flash` (Most advanced)
+- **Rate Limit**: 10 requests/minute (Free Tier)
+- **Delay**: 20 seconds between questions
+- **Expected runtime**: ~8-10 minutes for 20 questions

app.py CHANGED Viewed

@@ -12,7 +12,7 @@ from model import get_model
 # (Keep Constants as is)
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
-MODEL_ID = "gemini/gemini-1.5-flash-latest"
 # --- Async Question Processing ---
 async def process_question(agent, question: str, task_id: str, file_name: str = None) -> Dict:
@@ -45,11 +45,12 @@ async def run_questions_async(agent, questions_data: List[Dict]) -> tuple:
         submissions.append(result["submission"])
         logs.append(result["log"])
-        # Add 10 second delay between questions to respect Gemini free tier rate limits (10 requests/minute)
         # Skip delay after last question
         if idx < len(questions_data) - 1:
-            print(f"⏳ Waiting 10 seconds before next question to respect API rate limits...")
-            await asyncio.sleep(10)
     return submissions, logs

 # (Keep Constants as is)
 # --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+MODEL_ID = "gemini/gemini-2.5-flash"
 # --- Async Question Processing ---
 async def process_question(agent, question: str, task_id: str, file_name: str = None) -> Dict:
         submissions.append(result["submission"])
         logs.append(result["log"])
+        # Add 20 second delay between questions to respect Gemini 2.5 Flash free tier (10 requests/minute)
+        # Each question makes ~10-15 API calls, so we need longer delays
         # Skip delay after last question
         if idx < len(questions_data) - 1:
+            print(f"⏳ Waiting 20 seconds before next question to respect API rate limits (10 req/min)...")
+            await asyncio.sleep(20)
     return submissions, logs

tools/tools.py CHANGED Viewed

@@ -16,7 +16,7 @@ from pytesseract import image_to_string
 load_dotenv()
-MODEL_ID = "gemini/gemini-1.5-flash-latest"
 #  Vision Tool
 @tool

 load_dotenv()
+MODEL_ID = "gemini/gemini-2.5-flash"
 #  Vision Tool
 @tool