Mahynlo
		
	commited on
		
		
					Commit 
							
							·
						
						f611149
	
1
								Parent(s):
							
							949d073
								
Upgrade to Gemini 2.5 Flash (most advanced model) with 20s delays for 10 RPM limit
Browse files- RATE_LIMIT_FIX.md +41 -24
- app.py +5 -4
- tools/tools.py +1 -1
    	
        RATE_LIMIT_FIX.md
    CHANGED
    
    | @@ -1,57 +1,74 @@ | |
| 1 | 
             
            # Rate Limit Solution
         | 
| 2 |  | 
| 3 | 
             
            ## Problem
         | 
| 4 | 
            -
            Gemini  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 5 |  | 
| 6 | 
             
            ## Solution Applied
         | 
| 7 | 
            -
            Added ** | 
| 8 | 
            -
            - 20 questions ×  | 
| 9 | 
             
            - Prevents rate limit errors
         | 
| 10 | 
             
            - Ensures all questions are processed
         | 
| 11 |  | 
| 12 | 
            -
            ## Alternative | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 13 |  | 
| 14 | 
            -
            ### Option  | 
| 15 | 
             
            ```python
         | 
| 16 | 
            -
            MODEL_ID = "gemini/gemini- | 
| 17 | 
             
            ```
         | 
| 18 | 
            -
            - ** | 
| 19 | 
            -
            -  | 
| 20 | 
            -
            -  | 
| 21 |  | 
| 22 | 
            -
            ### Option  | 
| 23 | 
             
            ```python
         | 
| 24 | 
            -
            MODEL_ID = "gemini/gemini-1.5- | 
| 25 | 
             
            ```
         | 
| 26 | 
            -
            - ** | 
| 27 | 
            -
            -  | 
|  | |
| 28 |  | 
| 29 | 
            -
            ### Option  | 
| 30 | 
             
            If you have a paid Gemini API key:
         | 
| 31 | 
            -
            -  | 
| 32 | 
            -
            -  | 
|  | |
| 33 |  | 
| 34 | 
             
            ## How to Switch Models
         | 
| 35 |  | 
| 36 | 
             
            1. Edit `app.py` line 15:
         | 
| 37 | 
             
               ```python
         | 
| 38 | 
            -
               MODEL_ID = "gemini/gemini- | 
| 39 | 
             
               ```
         | 
| 40 |  | 
| 41 | 
             
            2. Edit `tools/tools.py` line 19:
         | 
| 42 | 
             
               ```python
         | 
| 43 | 
            -
               MODEL_ID = "gemini/gemini- | 
| 44 | 
             
               ```
         | 
| 45 |  | 
| 46 | 
             
            3. Commit and push:
         | 
| 47 | 
             
               ```bash
         | 
| 48 | 
             
               git add app.py tools/tools.py
         | 
| 49 | 
            -
               git commit -m " | 
| 50 | 
             
               git push
         | 
| 51 | 
             
               ```
         | 
| 52 |  | 
| 53 | 
            -
            ##  | 
| 54 | 
            -
            - Model | 
| 55 | 
            -
            - Rate Limit | 
| 56 | 
            -
            - Delay | 
| 57 | 
            -
            - Expected runtime | 
|  | |
| 1 | 
             
            # Rate Limit Solution
         | 
| 2 |  | 
| 3 | 
             
            ## Problem
         | 
| 4 | 
            +
            Gemini models on free tier have limited requests per minute (RPM), which is exhausted quickly by the agent (each question makes 10-15 API calls).
         | 
| 5 | 
            +
             | 
| 6 | 
            +
            ## Current Configuration
         | 
| 7 | 
            +
            **Using Gemini 2.5 Flash** - The most advanced model available
         | 
| 8 | 
            +
            - **10 requests/minute** (Free Tier)
         | 
| 9 | 
            +
            - Most capable reasoning and agentic capabilities
         | 
| 10 | 
            +
            - Best for GAIA benchmark tasks
         | 
| 11 |  | 
| 12 | 
             
            ## Solution Applied
         | 
| 13 | 
            +
            Added **20-second delays** between questions in `app.py` to respect rate limits. This means:
         | 
| 14 | 
            +
            - 20 questions × 20 seconds = ~6-7 minutes extra runtime
         | 
| 15 | 
             
            - Prevents rate limit errors
         | 
| 16 | 
             
            - Ensures all questions are processed
         | 
| 17 |  | 
| 18 | 
            +
            ## Alternative Models Available
         | 
| 19 | 
            +
             | 
| 20 | 
            +
            ### Option 1: Gemini 2.5 Flash (CURRENT - RECOMMENDED)
         | 
| 21 | 
            +
            ```python
         | 
| 22 | 
            +
            MODEL_ID = "gemini/gemini-2.5-flash"
         | 
| 23 | 
            +
            ```
         | 
| 24 | 
            +
            - **10 requests/minute** (Free Tier)
         | 
| 25 | 
            +
            - **Most advanced model** - Best reasoning capabilities
         | 
| 26 | 
            +
            - Optimized for agentic use cases
         | 
| 27 | 
            +
            - Best for GAIA benchmark
         | 
| 28 |  | 
| 29 | 
            +
            ### Option 2: Gemini 2.0 Flash
         | 
| 30 | 
             
            ```python
         | 
| 31 | 
            +
            MODEL_ID = "gemini/gemini-2.0-flash"
         | 
| 32 | 
             
            ```
         | 
| 33 | 
            +
            - **15 requests/minute** (Free Tier)
         | 
| 34 | 
            +
            - 50% higher rate limit than 2.5 Flash
         | 
| 35 | 
            +
            - Slightly less capable but faster
         | 
| 36 |  | 
| 37 | 
            +
            ### Option 3: Gemini 1.5 Flash
         | 
| 38 | 
             
            ```python
         | 
| 39 | 
            +
            MODEL_ID = "gemini/gemini-1.5-flash"
         | 
| 40 | 
             
            ```
         | 
| 41 | 
            +
            - **15 requests/minute** (Free Tier)
         | 
| 42 | 
            +
            - Older model but proven
         | 
| 43 | 
            +
            - Similar speed to 2.0 Flash
         | 
| 44 |  | 
| 45 | 
            +
            ### Option 4: Paid API Key (Tier 1+)
         | 
| 46 | 
             
            If you have a paid Gemini API key:
         | 
| 47 | 
            +
            - **Gemini 2.5 Flash**: 250 requests/minute
         | 
| 48 | 
            +
            - **Gemini 2.0 Flash**: 200 requests/minute
         | 
| 49 | 
            +
            - **Gemini 1.5 Flash**: 50 requests/minute
         | 
| 50 |  | 
| 51 | 
             
            ## How to Switch Models
         | 
| 52 |  | 
| 53 | 
             
            1. Edit `app.py` line 15:
         | 
| 54 | 
             
               ```python
         | 
| 55 | 
            +
               MODEL_ID = "gemini/gemini-2.5-flash"  # Change this
         | 
| 56 | 
             
               ```
         | 
| 57 |  | 
| 58 | 
             
            2. Edit `tools/tools.py` line 19:
         | 
| 59 | 
             
               ```python
         | 
| 60 | 
            +
               MODEL_ID = "gemini/gemini-2.5-flash"  # Change this
         | 
| 61 | 
             
               ```
         | 
| 62 |  | 
| 63 | 
             
            3. Commit and push:
         | 
| 64 | 
             
               ```bash
         | 
| 65 | 
             
               git add app.py tools/tools.py
         | 
| 66 | 
            +
               git commit -m "Update Gemini model"
         | 
| 67 | 
             
               git push
         | 
| 68 | 
             
               ```
         | 
| 69 |  | 
| 70 | 
            +
            ## Final Configuration
         | 
| 71 | 
            +
            - **Model**: `gemini/gemini-2.5-flash` (Most advanced)
         | 
| 72 | 
            +
            - **Rate Limit**: 10 requests/minute (Free Tier)
         | 
| 73 | 
            +
            - **Delay**: 20 seconds between questions
         | 
| 74 | 
            +
            - **Expected runtime**: ~8-10 minutes for 20 questions
         | 
    	
        app.py
    CHANGED
    
    | @@ -12,7 +12,7 @@ from model import get_model | |
| 12 | 
             
            # (Keep Constants as is)
         | 
| 13 | 
             
            # --- Constants ---
         | 
| 14 | 
             
            DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
         | 
| 15 | 
            -
            MODEL_ID = "gemini/gemini- | 
| 16 |  | 
| 17 | 
             
            # --- Async Question Processing ---
         | 
| 18 | 
             
            async def process_question(agent, question: str, task_id: str, file_name: str = None) -> Dict:
         | 
| @@ -45,11 +45,12 @@ async def run_questions_async(agent, questions_data: List[Dict]) -> tuple: | |
| 45 | 
             
                    submissions.append(result["submission"])
         | 
| 46 | 
             
                    logs.append(result["log"])
         | 
| 47 |  | 
| 48 | 
            -
                    # Add  | 
|  | |
| 49 | 
             
                    # Skip delay after last question
         | 
| 50 | 
             
                    if idx < len(questions_data) - 1:
         | 
| 51 | 
            -
                        print(f"⏳ Waiting  | 
| 52 | 
            -
                        await asyncio.sleep( | 
| 53 |  | 
| 54 | 
             
                return submissions, logs
         | 
| 55 |  | 
|  | |
| 12 | 
             
            # (Keep Constants as is)
         | 
| 13 | 
             
            # --- Constants ---
         | 
| 14 | 
             
            DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
         | 
| 15 | 
            +
            MODEL_ID = "gemini/gemini-2.5-flash"
         | 
| 16 |  | 
| 17 | 
             
            # --- Async Question Processing ---
         | 
| 18 | 
             
            async def process_question(agent, question: str, task_id: str, file_name: str = None) -> Dict:
         | 
|  | |
| 45 | 
             
                    submissions.append(result["submission"])
         | 
| 46 | 
             
                    logs.append(result["log"])
         | 
| 47 |  | 
| 48 | 
            +
                    # Add 20 second delay between questions to respect Gemini 2.5 Flash free tier (10 requests/minute)
         | 
| 49 | 
            +
                    # Each question makes ~10-15 API calls, so we need longer delays
         | 
| 50 | 
             
                    # Skip delay after last question
         | 
| 51 | 
             
                    if idx < len(questions_data) - 1:
         | 
| 52 | 
            +
                        print(f"⏳ Waiting 20 seconds before next question to respect API rate limits (10 req/min)...")
         | 
| 53 | 
            +
                        await asyncio.sleep(20)
         | 
| 54 |  | 
| 55 | 
             
                return submissions, logs
         | 
| 56 |  | 
    	
        tools/tools.py
    CHANGED
    
    | @@ -16,7 +16,7 @@ from pytesseract import image_to_string | |
| 16 |  | 
| 17 | 
             
            load_dotenv()
         | 
| 18 |  | 
| 19 | 
            -
            MODEL_ID = "gemini/gemini- | 
| 20 |  | 
| 21 | 
             
            #  Vision Tool 
         | 
| 22 | 
             
            @tool
         | 
|  | |
| 16 |  | 
| 17 | 
             
            load_dotenv()
         | 
| 18 |  | 
| 19 | 
            +
            MODEL_ID = "gemini/gemini-2.5-flash"
         | 
| 20 |  | 
| 21 | 
             
            #  Vision Tool 
         | 
| 22 | 
             
            @tool
         | 
