Spaces:
Running
Running
| title: Web Search MCP | |
| emoji: π | |
| colorFrom: red | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 5.36.2 | |
| app_file: app.py | |
| pinned: false | |
| short_description: Search and extract web content for LLM ingestion | |
| thumbnail: >- | |
| https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png | |
| # Web Search MCP Server | |
| A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles. | |
| ## Features | |
| - **Dual search modes**: | |
| - **General Search**: Get diverse results from blogs, documentation, articles, and more | |
| - **News Search**: Find fresh news articles and breaking stories from news sources | |
| - **Real-time web search**: Search for any topic with up-to-date results | |
| - **Content extraction**: Automatically extracts main article content, removing ads and boilerplate | |
| - **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse | |
| - **Structured output**: Returns formatted content with metadata (title, source, date, URL) | |
| - **Flexible results**: Control the number of results (1-20) | |
| ## Prerequisites | |
| 1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key | |
| 2. **Python 3.8+**: Ensure you have Python installed | |
| 3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application | |
| ## Installation | |
| 1. Clone or download this repository | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Or install manually: | |
| ```bash | |
| pip install "gradio[mcp]" httpx trafilatura python-dateutil limits | |
| ``` | |
| 3. Set your Serper API key: | |
| ```bash | |
| export SERPER_API_KEY="your-api-key-here" | |
| ``` | |
| ## Usage | |
| ### Starting the MCP Server | |
| ```bash | |
| python app_mcp.py | |
| ``` | |
| The server will start on `http://localhost:7860` with the MCP endpoint at: | |
| ``` | |
| http://localhost:7860/gradio_api/mcp/sse | |
| ``` | |
| ### Connecting to LLM Clients | |
| #### Claude Desktop | |
| Add to your `claude_desktop_config.json`: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "web-search": { | |
| "command": "python", | |
| "args": ["/path/to/app_mcp.py"], | |
| "env": { | |
| "SERPER_API_KEY": "your-api-key-here" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| #### Direct URL Connection | |
| For clients that support URL-based MCP servers: | |
| 1. Start the server: `python app_mcp.py` | |
| 2. Connect to: `http://localhost:7860/gradio_api/mcp/sse` | |
| ## Tool Documentation | |
| ### `search_web` Function | |
| **Purpose**: Search the web for information or fresh news and extract content. | |
| **Parameters**: | |
| - `query` (str, **REQUIRED**): The search query | |
| - Examples: "OpenAI news", "climate change 2024", "python tutorial" | |
| - `num_results` (int, **OPTIONAL**): Number of results to fetch | |
| - Default: 4 | |
| - Range: 1-20 | |
| - More results provide more context but take longer | |
| - `search_type` (str, **OPTIONAL**): Type of search to perform | |
| - Default: "search" (general web search) | |
| - Options: "search" or "news" | |
| - Use "news" for fresh, time-sensitive news articles | |
| - Use "search" for general information, documentation, tutorials | |
| **Returns**: Formatted text containing: | |
| - Summary of extraction results | |
| - For each article: | |
| - Title | |
| - Source and date | |
| - URL | |
| - Extracted main content | |
| **When to use each search type**: | |
| - **Use "news" mode for**: | |
| - Breaking news or very recent events | |
| - Time-sensitive information ("today", "this week") | |
| - Current affairs and latest developments | |
| - Press releases and announcements | |
| - **Use "search" mode for**: | |
| - General information and research | |
| - Technical documentation or tutorials | |
| - Historical information | |
| - Diverse perspectives from various sources | |
| - How-to guides and explanations | |
| **Example Usage in LLM**: | |
| ``` | |
| # News mode examples | |
| "Search for breaking news about OpenAI" -> uses news mode | |
| "Find today's stock market updates" -> uses news mode | |
| "Get latest climate change developments" -> uses news mode | |
| # Search mode examples (default) | |
| "Search for Python programming tutorials" -> uses search mode | |
| "Find information about machine learning algorithms" -> uses search mode | |
| "Research historical data about climate change" -> uses search mode | |
| ``` | |
| ## Error Handling | |
| The tool handles various error scenarios: | |
| - Missing API key: Clear error message with setup instructions | |
| - Rate limiting: Informs when limit is exceeded | |
| - Failed extractions: Reports which articles couldn't be extracted | |
| - Network errors: Graceful error messages | |
| ## Testing | |
| You can test the server manually: | |
| 1. Open `http://localhost:7860` in your browser | |
| 2. Enter a search query | |
| 3. Adjust the number of results | |
| 4. Click "Search" to see the extracted content | |
| ## Tips for LLM Usage | |
| 1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information | |
| 2. **Be specific with queries**: More specific queries yield better results | |
| 3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research | |
| 4. **Check dates**: The tool shows article dates for temporal context | |
| 5. **Follow up**: Use the extracted content to ask follow-up questions | |
| ## Limitations | |
| - Rate limited to 200 requests per hour | |
| - Extraction quality depends on website structure | |
| - Some websites may block automated access | |
| - News mode focuses on recent articles from news sources | |
| - Search mode provides diverse results but may include older content | |
| ## Troubleshooting | |
| 1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported | |
| 2. **Rate limit errors**: Wait before making more requests | |
| 3. **No content extracted**: Some websites block scrapers; try different queries | |
| 4. **Connection errors**: Check your internet connection and firewall settings |