agents-course-v2 / SUPABASE_SETUP.md
D3MI4N's picture
clean up project repo
b36ff59

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Supabase Setup for Optimal GAIA Agent Performance

Required Supabase Configuration

1. Create the match_documents_langchain Function

This SQL function enables efficient vector similarity search:

-- Create the similarity search function for LangChain integration
create or replace function match_documents_langchain (
  query_embedding vector(1536),  -- Adjust dimension based on embedding model
  match_threshold float default 0.75,
  match_count int default 3
)
returns table (
  id uuid,
  page_content text,
  embedding vector,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    documents.id,
    documents.page_content,
    documents.embedding,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
;
```

### 2. Alternative for HuggingFace Embeddings (384 dimensions)

If using `sentence-transformers/all-mpnet-base-v2`:

```sql
-- For HuggingFace embeddings (384 dimensions)
create or replace function match_documents_langchain_hf (
  query_embedding vector(384),
  match_threshold float default 0.75,
  match_count int default 3
)
returns table (
  id uuid,
  page_content text,
  embedding vector,
  metadata jsonb,
  similarity float
)
language plpgsql
as 
begin
  return query
  select
    documents.id,
    documents.page_content,
    documents.embedding,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

3. Update Database Table Structure

Ensure the documents table has the right structure:

-- Check/create the documents table structure
CREATE TABLE IF NOT EXISTS documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  page_content TEXT NOT NULL,
  embedding VECTOR(1536), -- Or 384 for HuggingFace
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP WITH TIME ZONE DEFAULT TIMEZONE('utc'::text, NOW())
);

-- Create index for fast similarity search
CREATE INDEX IF NOT EXISTS documents_embedding_idx 
ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

4. Environment Variables

Update the .env file:

# Required for both approaches
SUPABASE_URL=supabase_project_url
SUPABASE_SERVICE_KEY=supabase_service_key
# Alternative key name (some setups use this)
SUPABASE_KEY=supabase_service_key

# Optional: For OpenAI fallback
OPENAI_API_KEY=openai_api_key

Performance Comparison

HuggingFace Approach (Recommended)

Free embedding modelOften better semantic understanding
384-dimensional vectors (smaller storage)No API rate limits

OpenAI Approach (Fallback)

Very reliable and consistent1536-dimensional vectors (more detailed)Costs money per embeddingAPI rate limits

Testing the Setup

  1. Test the function exists:
SELECT * FROM match_documents_langchain(
  '[0.1, 0.2, ...]'::vector,  -- Sample embedding
  0.7,  -- Threshold
  5     -- Count
);
  1. Test with Python:
from tools.database_tools import retriever

# Test efficient search
results = retriever.search_similar_questions_efficient(
    "What is the capital of France?", 
    top_k=3
)
print(results)

Migration from Manual to Efficient Search

If you're currently using manual similarity search, the new hybrid approach will:

  1. Try efficient LangChain search first
  2. Fall back to manual search if needed
  3. Automatically detect which approach works

This ensures compatibility while optimizing for performance!