Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.49.1
Supabase Setup for Optimal GAIA Agent Performance
Required Supabase Configuration
1. Create the match_documents_langchain Function
This SQL function enables efficient vector similarity search:
-- Create the similarity search function for LangChain integration
create or replace function match_documents_langchain (
query_embedding vector(1536), -- Adjust dimension based on embedding model
match_threshold float default 0.75,
match_count int default 3
)
returns table (
id uuid,
page_content text,
embedding vector,
metadata jsonb,
similarity float
)
language plpgsql
as $$
begin
return query
select
documents.id,
documents.page_content,
documents.embedding,
documents.metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where 1 - (documents.embedding <=> query_embedding) > match_threshold
order by documents.embedding <=> query_embedding
limit match_count;
end;
;
```
### 2. Alternative for HuggingFace Embeddings (384 dimensions)
If using `sentence-transformers/all-mpnet-base-v2`:
```sql
-- For HuggingFace embeddings (384 dimensions)
create or replace function match_documents_langchain_hf (
query_embedding vector(384),
match_threshold float default 0.75,
match_count int default 3
)
returns table (
id uuid,
page_content text,
embedding vector,
metadata jsonb,
similarity float
)
language plpgsql
as
begin
return query
select
documents.id,
documents.page_content,
documents.embedding,
documents.metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where 1 - (documents.embedding <=> query_embedding) > match_threshold
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
3. Update Database Table Structure
Ensure the documents table has the right structure:
-- Check/create the documents table structure
CREATE TABLE IF NOT EXISTS documents (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
page_content TEXT NOT NULL,
embedding VECTOR(1536), -- Or 384 for HuggingFace
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT TIMEZONE('utc'::text, NOW())
);
-- Create index for fast similarity search
CREATE INDEX IF NOT EXISTS documents_embedding_idx
ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
4. Environment Variables
Update the .env file:
# Required for both approaches
SUPABASE_URL=supabase_project_url
SUPABASE_SERVICE_KEY=supabase_service_key
# Alternative key name (some setups use this)
SUPABASE_KEY=supabase_service_key
# Optional: For OpenAI fallback
OPENAI_API_KEY=openai_api_key
Performance Comparison
HuggingFace Approach (Recommended)
✅ Free embedding model
✅ Often better semantic understanding
✅ 384-dimensional vectors (smaller storage)
✅ No API rate limits
OpenAI Approach (Fallback)
✅ Very reliable and consistent ✅ 1536-dimensional vectors (more detailed) ❌ Costs money per embedding ❌ API rate limits
Testing the Setup
- Test the function exists:
SELECT * FROM match_documents_langchain(
'[0.1, 0.2, ...]'::vector, -- Sample embedding
0.7, -- Threshold
5 -- Count
);
- Test with Python:
from tools.database_tools import retriever
# Test efficient search
results = retriever.search_similar_questions_efficient(
"What is the capital of France?",
top_k=3
)
print(results)
Migration from Manual to Efficient Search
If you're currently using manual similarity search, the new hybrid approach will:
- Try efficient LangChain search first
- Fall back to manual search if needed
- Automatically detect which approach works
This ensures compatibility while optimizing for performance!