Spaces:

D3MI4N
/

agents-course-v2

Sleeping

App Files Files Community

agents-course-v2 / SUPABASE_SETUP.md

D3MI4N

clean up project repo

b36ff59 2 months ago

preview code

raw

history blame contribute delete

3.88 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Supabase Setup for Optimal GAIA Agent Performance

Required Supabase Configuration

1. Create the `match_documents_langchain` Function

This SQL function enables efficient vector similarity search:

-- Create the similarity search function for LangChain integration
create or replace function match_documents_langchain (
  query_embedding vector(1536),  -- Adjust dimension based on embedding model
  match_threshold float default 0.75,
  match_count int default 3
)
returns table (
  id uuid,
  page_content text,
  embedding vector,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    documents.id,
    documents.page_content,
    documents.embedding,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
 $; ``` ### 2. Alternative for HuggingFace Embeddings (384 dimensions) If using `sentence-transformers/all-mpnet-base-v2`: ```sql -- For HuggingFace embeddings (384 dimensions) create or replace function match_documents_langchain_hf ( query_embedding vector(384), match_threshold float default 0.75, match_count int default 3 ) returns table ( id uuid, page_content text, embedding vector, metadata jsonb, similarity float ) language plpgsql as$ 
begin
  return query
  select
    documents.id,
    documents.page_content,
    documents.embedding,
    documents.metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

3. Update Database Table Structure

Ensure the documents table has the right structure:

-- Check/create the documents table structure
CREATE TABLE IF NOT EXISTS documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  page_content TEXT NOT NULL,
  embedding VECTOR(1536), -- Or 384 for HuggingFace
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP WITH TIME ZONE DEFAULT TIMEZONE('utc'::text, NOW())
);

-- Create index for fast similarity search
CREATE INDEX IF NOT EXISTS documents_embedding_idx 
ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

4. Environment Variables

Update the .env file:

# Required for both approaches
SUPABASE_URL=supabase_project_url
SUPABASE_SERVICE_KEY=supabase_service_key
# Alternative key name (some setups use this)
SUPABASE_KEY=supabase_service_key

# Optional: For OpenAI fallback
OPENAI_API_KEY=openai_api_key

Performance Comparison

HuggingFace Approach (Recommended)

✅ Free embedding model ✅ Often better semantic understanding
✅ 384-dimensional vectors (smaller storage) ✅ No API rate limits

OpenAI Approach (Fallback)

✅ Very reliable and consistent ✅ 1536-dimensional vectors (more detailed) ❌ Costs money per embedding ❌ API rate limits

Testing the Setup

Test the function exists:

SELECT * FROM match_documents_langchain(
  '[0.1, 0.2, ...]'::vector,  -- Sample embedding
  0.7,  -- Threshold
  5     -- Count
);

Test with Python:

from tools.database_tools import retriever

# Test efficient search
results = retriever.search_similar_questions_efficient(
    "What is the capital of France?", 
    top_k=3
)
print(results)

Migration from Manual to Efficient Search

If you're currently using manual similarity search, the new hybrid approach will:

Try efficient LangChain search first
Fall back to manual search if needed
Automatically detect which approach works

This ensures compatibility while optimizing for performance!