tomalex04's picture
Restructured project: separate backend and ui folders
5bda825

A newer version of the Gradio SDK is available: 6.0.2

Upgrade

Gemini Integration: GDELT Query Builder & Summarization

This project uses Google’s Gemini for:

  • Building 10 language-preserving GDELT query variations.
  • Analyzing outlet bias and returning categories (must include exactly “unbiased”).
  • Producing a multi‑perspective factual summary grouped by bias categories.

Gemini runs in the cloud and is not cached. The local embedding model is cached and shared across requests.

Setup

  • Get an API key from https://ai.google.dev/
  • .env:
    • GEMINI_API_KEY=your_key
    • GEMINI_MODEL=gemini-1.5-pro (or gemini-2.5-pro / flash variants)

Query Builder (gdelt_query_builder.py)

  • Generates EXACTLY 10 variations separated by |||.
  • Preserves user language.
  • Uses AND-only operators between terms.
  • Adds sourcecountry/sourceregion and datetimes when implied.
  • Sensitive-query guard:
    • The system prompt instructs Gemini to return the literal token INAPPROPRIATE_QUERY_DETECTED for sensitive topics (e.g., pornography, explicit adult content, certain religious questions flagged by policy).
    • The backend detects this and immediately returns a summary “I cannot respond to this query.” (status=blocked).

Example request body to backend:

{"query": "news about war in Ukraine"}

Bias Analysis

  • Gemini returns bias categories and counts; one category is normalized to exactly “unbiased”.
  • Reasoning text explains the categorization logic; this is appended after sources in the final summary.

Summarization

  • Backend sends top URLs from all categories (including unbiased) to Gemini, labeled by category.
  • Gemini instruction highlights:
    • Produce a concise factual answer first.
    • Then list SOURCES BY CATEGORY with up to 5 URLs per category.
    • Numbering restarts at 1 per category (1–5 for each).
    • After sources, append “REASONING:” with the bias-analysis reasoning string.
  • The backend returns only the final formatted summary string; UI renders it verbatim.

Notes

  • No Gemini model caching (cloud API).
  • Local embedding model (SentenceTransformers) is cached once and reused.
  • Optional whitelist filtering toggled via USE_WHITELIST_ONLY in .env.