Spaces:

tomalex04
/

Credible

Sleeping

App Files Files Community

Credible / misinformation_detection_backend /README_GEMINI.md

tomalex04

Restructured project: separate backend and ui folders

5bda825 3 months ago

preview code

raw

history blame contribute delete

2.16 kB

	# Gemini Integration: GDELT Query Builder & Summarization

	This project uses Google’s Gemini for:
	- Building 10 language-preserving GDELT query variations.
	- Analyzing outlet bias and returning categories (must include exactly “unbiased”).
	- Producing a multi‑perspective factual summary grouped by bias categories.

	Gemini runs in the cloud and is not cached. The local embedding model is cached and shared across requests.

	## Setup
	- Get an API key from https://ai.google.dev/
	- .env:
	- GEMINI_API_KEY=your_key
	- GEMINI_MODEL=gemini-1.5-pro (or gemini-2.5-pro / flash variants)

	## Query Builder (gdelt_query_builder.py)
	- Generates EXACTLY 10 variations separated by \|\|\|.
	- Preserves user language.
	- Uses AND-only operators between terms.
	- Adds sourcecountry/sourceregion and datetimes when implied.
	- Sensitive-query guard:
	- The system prompt instructs Gemini to return the literal token INAPPROPRIATE_QUERY_DETECTED for sensitive topics (e.g., pornography, explicit adult content, certain religious questions flagged by policy).
	- The backend detects this and immediately returns a summary “I cannot respond to this query.” (status=blocked).

	Example request body to backend:
	```
	{"query": "news about war in Ukraine"}
	```

	## Bias Analysis
	- Gemini returns bias categories and counts; one category is normalized to exactly “unbiased”.
	- Reasoning text explains the categorization logic; this is appended after sources in the final summary.

	## Summarization
	- Backend sends top URLs from all categories (including unbiased) to Gemini, labeled by category.
	- Gemini instruction highlights:
	- Produce a concise factual answer first.
	- Then list SOURCES BY CATEGORY with up to 5 URLs per category.
	- Numbering restarts at 1 per category (1–5 for each).
	- After sources, append “REASONING:” with the bias-analysis reasoning string.
	- The backend returns only the final formatted summary string; UI renders it verbatim.

	## Notes
	- No Gemini model caching (cloud API).
	- Local embedding model (SentenceTransformers) is cached once and reused.
	- Optional whitelist filtering toggled via USE_WHITELIST_ONLY in .env.