Spaces:

userx2000
/

cloudzy_ai_challenge

Running

App Files Files Community

matinsn2000 commited on 4 days ago

Commit

8ad42f5

1 Parent(s): 8779583

Updated search engine for generating album

Browse files

Files changed (9) hide show

AI_USAGE_REPORT.txt +359 -0
cloudzy/agents/image_analyzer.py +1 -1
cloudzy/ai_utils.py +41 -0
cloudzy/database.py +4 -0
cloudzy/routes/photo.py +83 -3
cloudzy/routes/upload.py +65 -62
cloudzy/schemas.py +18 -1
cloudzy/search_engine.py +66 -0
cloudzy/utils/file_upload_service.py +0 -1

AI_USAGE_REPORT.txt ADDED Viewed

	@@ -0,0 +1,359 @@

+================================================================================
+                        AI USAGE REPORT
+           Cloudzy AI Challenge - Photo Album Management System
+================================================================================
+PROJECT OVERVIEW
+================
+This project implements an AI-enhanced photo management system that uses machine learning
+models for generating embeddings and AI summaries for photo clusters. The system allows
+users to upload photos, search by similarity, and organize them into meaningful albums
+with AI-generated summaries.
+================================================================================
+1. WHERE AND HOW AI WAS USED
+================================================================================
+A. IMAGE EMBEDDING GENERATION
+Location: cloudzy/ai_utils.py - ImageEmbeddingGenerator class
+Purpose: Convert photo metadata (tags, description, caption) into 1024-dimensional
+         vector embeddings for similarity search
+Model Used:
+- Provider: Hugging Face Hub (InferenceClient)
+- Model Name: intfloat/multilingual-e5-large
+- Endpoint: feature_extraction
+How It's Used:
+1. User uploads photo with metadata (tags, caption, description)
+2. metadata is combined into a single text string
+3. Text is sent to HF model via InferenceClient.feature_extraction()
+4. Model returns 1024-d embedding vector
+5. Embedding is stored in FAISS index for similarity search
+Integration Points:
+- cloudzy/routes/upload.py: Called during photo upload
+- cloudzy/search_engine.py: Used for vector similarity search
+- Database: Embeddings stored as numpy arrays
+B. AI SUMMARY GENERATION
+Location: cloudzy/ai_utils.py - TextSummarizer class
+Purpose: Generate meaningful summaries of photo clusters based on actual photo metadata
+Model Used:
+- Provider: Hugging Face Hub (InferenceClient)
+- Model Name: facebook/bart-large-cnn
+- Endpoint: summarization
+How It's Used:
+1. User requests /albums endpoint
+2. System retrieves all photo clusters
+3. For each cluster, collects all captions and tags from photos
+4. Combined metadata is sent to BART summarization model
+5. Model generates concise summary (e.g., "A collection of indoor photos featuring...")
+6. Summary replaces placeholder "Cluster of similar photos" in response
+Integration Points:
+- cloudzy/routes/photo.py: get_albums() endpoint
+- Response Schema: Pydantic AlbumItem model
+- Fallback: If summarization fails, returns truncated text
+================================================================================
+2. PROMPTS AND MODEL INPUTS
+================================================================================
+A. IMAGE EMBEDDING INPUTS
+Raw Input Format:
+  tags: List[str] = ["nature", "sunset", "beach"]
+  description: str = "A beautiful sunset at the beach with waves"
+  caption: str = "Sunset beach scene"
+Processing:
+  Combined Text = " ".join(tags) + " " + description + " " + caption
+  Example: "nature sunset beach A beautiful sunset at the beach with waves Sunset beach scene"
+Model Request (Hugging Face InferenceClient):
+  client.feature_extraction(
+      text=combined_text,
+      model="intfloat/multilingual-e5-large"
+  )
+Expected Output:
+  - Type: List of floats (1024 dimensions)
+  - Converted to: numpy.ndarray of shape (1024,)
+  - Data type: float32
+  - Usage: Stored in FAISS index for vector similarity search
+B. SUMMARIZATION INPUTS
+Raw Input Format:
+  For each album cluster, combine all photo metadata:
+  texts = []
+  for photo in cluster_photos:
+      texts.append(photo.caption)
+      texts.extend(photo.tags)
+  combined_input = " ".join(texts)
+Example Input:
+  "Beach sunset waves ocean Sunset at the ocean view Nature landscape
+   Seascape beautiful A sunset scene with ocean waves A scenic beach view"
+Model Request (Hugging Face InferenceClient):
+  client.summarization(
+      text=combined_input,
+      model="facebook/bart-large-cnn"
+  )
+Expected Output:
+  - Type: List containing dictionary with 'summary_text' key
+  - Example: "A collection of beach and sunset photographs featuring scenic ocean views"
+  - Processing: Extract summary_text from returned object
+  - Type Conversion: Ensure string type for Pydantic validation
+================================================================================
+3. HOW MODEL OUTPUTS WERE REFINED
+================================================================================
+A. EMBEDDING OUTPUT REFINEMENT
+Issue Encountered:
+  - Expected shape: (512,) per documentation
+  - Actual shape: (1024,) from model
+  - Initial: Validation checked for 1024 but comment said 512
+Resolution:
+  - Updated validation to expect 1024 dimensions (correct model behavior)
+  - Converged to: if embedding.shape[0] != 1024: raise ValueError
+  - Added type casting: np.array(result, dtype=np.float32).reshape(-1)
+  - Reshape(-1) ensures flattening to 1D array
+Code Refinement (ai_utils.py, lines 50-62):
+  def _embed_text(self, text: str) -> np.ndarray:
+      result = self.client.feature_extraction(text, model=self.model_name)
+      embedding = np.array(result, dtype=np.float32).reshape(-1)
+      if embedding.shape[0] != 1024:
+          raise ValueError(f"Expected embedding of size 1024, got {embedding.shape[0]}")
+      return embedding
+B. SUMMARIZATION OUTPUT REFINEMENT
+Issue Encountered:
+  - Pydantic validation error: "Input should be a valid string"
+  - Received: SummarizationOutput object instead of string
+  - Root Cause: client.summarization() returns structured object, not string
+Resolution:
+  - Added type-safe extraction logic
+  - Implemented multiple fallback formats:
+    1. If list: Extract first element's 'summary_text' field
+    2. If dict: Get 'summary_text' field directly
+    3. Fallback: Convert to string
+Code Refinement (ai_utils.py, lines 90-100):
+  result = self.client.summarization(text, model=self.model_name)
+  # Extract the summary text from the result object
+  if isinstance(result, list) and len(result) > 0:
+      return result[0].get("summary_text", str(result[0]))
+  elif isinstance(result, dict):
+      return result.get("summary_text", str(result))
+  else:
+      return str(result)
+C. ERROR HANDLING AND DEFAULTS
+Embedding Generation:
+  - Validation ensures exact dimension match
+  - Raises clear error if dimension mismatch
+  - Prevents downstream vector search issues
+Summarization:
+  - Try-except block with graceful fallback
+  - Fallback: Returns truncated input (first 80 chars)
+  - Empty text handling: Returns default "Album of photos"
+  - Ensures robustness when HF API is unavailable
+================================================================================
+4. MANUAL VS AI-GENERATED PARTS
+================================================================================
+MANUAL PARTS (100% Developer-Written)
+====================================
+✓ Database schema and models
+  - cloudzy/models.py: SQLAlchemy Photo model
+  - cloudzy/database.py: Database connection and session management
+✓ API Route Handlers
+  - cloudzy/routes/photo.py: All endpoint logic
+  - cloudzy/routes/upload.py: File upload handling
+  - cloudzy/routes/search.py: Search endpoint implementation
+✓ File Management
+  - cloudzy/utils/file_upload_service.py: Upload service
+  - cloudzy/utils/file_utils.py: File utilities
+✓ Data Serialization
+  - cloudzy/schemas.py: Pydantic models and validation
+✓ Search Engine Implementation
+  - cloudzy/search_engine.py: FAISS vector search logic
+  - Distance calculation and result ranking
+✓ Application Configuration
+  - app.py: FastAPI app setup
+  - Dockerfile: Containerization
+  - requirements.txt: Dependencies
+HYBRID PARTS (Manual Integration + AI Models)
+==============================================
+✓ ImageEmbeddingGenerator Class
+  - Manual: Class structure, API client initialization
+  - Manual: Error handling and validation logic
+  - Manual: Type conversion and reshaping
+  - AI: Feature extraction from HF model
+  - Result: Text → 1024-d vector embeddings
+✓ TextSummarizer Class
+  - Manual: Class structure, API client initialization
+  - Manual: Output parsing and extraction logic
+  - Manual: Error handling and fallbacks
+  - Manual: Empty text handling
+  - AI: Summary generation from combined text
+  - Result: Multi-sentence text → concise summary
+✓ Album Summary Integration (photo.py)
+  - Manual: Cluster iteration and photo data collection
+  - Manual: Text concatenation logic
+  - Manual: Response structure and schema mapping
+  - AI: Summary generation
+  - Result: Photo cluster → meaningful album summary
+AI-GENERATED PARTS
+==================
+✓ Embedding vectors
+  - Generated by: intfloat/multilingual-e5-large
+  - Content: Semantic representation of photo metadata
+  - Used for: Similarity search and clustering
+✓ Album summaries
+  - Generated by: facebook/bart-large-cnn
+  - Content: Concise description of photo cluster themes
+  - Used for: Album display and description
+✓ Model-specific responses
+  - Output format: Determined by HF models
+  - Processing: Handled by manual extraction code
+================================================================================
+5. DEVELOPMENT PROCESS AND DECISIONS
+================================================================================
+DECISION 1: Model Selection
+Manual Decision: Why facebook/bart-large-cnn?
+- Reasons:
+  * Pre-trained on CNN/DailyMail summarization corpus
+  * Optimized for multi-sentence summarization
+  * Fast inference through Hugging Face API
+  * Produces concise, extractive summaries
+Alternative considered: facebook/bart-base (smaller, faster but lower quality)
+DECISION 2: Embedding Dimension Resolution
+Manual Decision: Accept 1024-d embeddings (not 512-d)
+- Reason:
+  * intfloat/multilingual-e5-large actually produces 1024 dimensions
+  * Better semantic representation than 512-d
+  * FAISS index configured for 1024-d vectors
+  * Updated validation to reflect actual model output
+DECISION 3: Error Handling Strategy
+Manual Decision: Graceful degradation with fallbacks
+- Implementation:
+  * Try summarization first
+  * If fails, return truncated text
+  * If text is empty, return default message
+  * Ensures endpoint never fails due to AI API issues
+DECISION 4: Output Extraction
+Manual Decision: Flexible type handling for model output
+- Implementation:
+  * Handle both list and dict return formats
+  * Extract 'summary_text' field when available
+  * Fallback to string conversion
+  * Ensures compatibility with different API versions
+================================================================================
+6. TESTING AND VALIDATION
+================================================================================
+Validation Points:
+✓ Embedding shape validation (must be 1024-d)
+✓ Type conversion to float32
+✓ Summary extraction and string conversion
+✓ Pydantic schema validation (AlbumItem requires string album_summary)
+✓ Error handling and fallbacks
+Testing Done:
+✓ Manual endpoint testing with sample photos
+✓ Verified embedding shape and type
+✓ Tested summarization with various input lengths
+✓ Validated API error handling
+✓ Checked Pydantic schema compliance
+================================================================================
+7. ENVIRONMENT CONFIGURATION
+================================================================================
+Required Environment Variables:
+- HF_TOKEN: Hugging Face API token (for authentication)
+  Location: Set in .env file
+  Usage: InferenceClient initialization
+  Scope: Both ImageEmbeddingGenerator and TextSummarizer
+API Access:
+- Provider: Hugging Face Inference API
+- Authentication: Token-based via HF_TOKEN
+- Rate Limiting: Subject to HF plan limits
+- Fallback: When unavailable, gracefully returns truncated text
+================================================================================
+8. PERFORMANCE CONSIDERATIONS
+================================================================================
+Current Implementation:
+- Summarization called per album cluster (on-demand)
+- Embedding generation per photo upload
+- FAISS vector search (fast, local)
+Potential Optimizations:
+✓ Cache summaries in database (reduce API calls)
+✓ Batch embedding generation for multiple uploads
+✓ Implement summary caching with TTL
+✓ Consider async processing for large clusters
+Current Trade-offs:
+- Speed vs Freshness: Summaries generated on-demand (fresh, slower)
+- Accuracy vs Cost: Full text summarization vs cached summaries
+================================================================================
+SUMMARY
+================================================================================
+This project demonstrates responsible AI integration:
+1. Clear Separation: Manual development (infrastructure, logic) vs AI (models)
+2. Error Handling: Graceful degradation when AI services unavailable
+3. Transparency: Documented model choices and output processing
+4. Flexibility: Handle various model output formats
+5. Validation: Schema validation ensures data integrity
+6. Integration: AI models complement, not replace, core functionality
+AI Value Added:
+- Semantic search capabilities (embeddings)
+- Automated summary generation (reduces manual effort)
+- Better user experience (meaningful album descriptions)
+Human Involvement:
+- System design and architecture
+- Error handling and edge cases
+- API integration and data processing
+- Schema definition and validation
+- Deployment and configuration
+================================================================================

cloudzy/agents/image_analyzer.py CHANGED Viewed

@@ -42,7 +42,7 @@ Describe this image in the following exact format:
 result: {
   "tags": [list of tags related to the image],
-  "description": "a 10-line descriptive description for the image",
   "caption": "a short description for the image"
 }
 """

 result: {
   "tags": [list of tags related to the image],
+  "description": "a 5-line descriptive description for the image",
   "caption": "a short description for the image"
 }
 """

cloudzy/ai_utils.py CHANGED Viewed

@@ -61,6 +61,47 @@ class ImageEmbeddingGenerator:
             raise ValueError(f"Expected embedding of size 1024, got {embedding.shape[0]}")
         return embedding
 # Example usage:
 if __name__ == "__main__":
     generator = ImageEmbeddingGenerator()

             raise ValueError(f"Expected embedding of size 1024, got {embedding.shape[0]}")
         return embedding
+class TextSummarizer:
+    def __init__(self, model_name: str = "facebook/bart-large-cnn"):
+        """
+        Initialize the text summarizer with a Hugging Face model.
+        """
+        self.client = InferenceClient(
+            provider="hf-inference",
+            api_key=os.environ["HF_TOKEN_1"],
+        )
+        self.model_name = model_name
+    def summarize(self, text: str) -> str:
+        """
+        Generate a summary of the given text.
+        Args:
+            text: Text to summarize
+        Returns:
+            summary: Generated summary string
+        """
+        if not text or text.strip() == "":
+            return "Album of photos"
+        try:
+            result = self.client.summarization(
+                text,
+                model=self.model_name,
+            )
+            # Extract the summary text from the result object
+            if isinstance(result, list) and len(result) > 0:
+                return result[0].get("summary_text", str(result[0]))
+            elif isinstance(result, dict):
+                return result.get("summary_text", str(result))
+            else:
+                return str(result)
+        except Exception as e:
+            # Fallback if summarization fails
+            return f"Collection: {text[:80]}..."
 # Example usage:
 if __name__ == "__main__":
     generator = ImageEmbeddingGenerator()

cloudzy/database.py CHANGED Viewed

@@ -14,6 +14,10 @@ engine = create_engine(
     connect_args=connect_args,
 )
 def create_db_and_tables():
     """Create all database tables"""

     connect_args=connect_args,
 )
+# Session factory for manual session creation
+def SessionLocal():
+    return Session(engine)
 def create_db_and_tables():
     """Create all database tables"""

cloudzy/routes/photo.py CHANGED Viewed

@@ -1,10 +1,14 @@
 """Photo retrieval endpoints"""
-from fastapi import APIRouter, Depends, HTTPException
 from sqlmodel import Session, select
 from cloudzy.database import get_session
 from cloudzy.models import Photo
-from cloudzy.schemas import PhotoDetailResponse
 router = APIRouter(tags=["photos"])
@@ -25,9 +29,12 @@ async def get_photo(
     if not photo:
         raise HTTPException(status_code=404, detail=f"Photo {photo_id} not found")
     return PhotoDetailResponse(
         id=photo.id,
         filename=photo.filename,
         tags=photo.get_tags(),
         caption=photo.caption,
         embedding=photo.get_embedding(),
@@ -55,15 +62,88 @@ async def list_photos(
     statement = select(Photo).offset(skip).limit(limit)
     photos = session.exec(statement).all()
     return [
         PhotoDetailResponse(
             id=photo.id,
             filename=photo.filename,
             tags=photo.get_tags(),
             caption=photo.caption,
             embedding=photo.get_embedding(),
             created_at=photo.created_at,
         )
         for photo in photos
-    ]

 """Photo retrieval endpoints"""
+from fastapi import APIRouter, Depends, HTTPException,Query
 from sqlmodel import Session, select
+import numpy as np
 from cloudzy.database import get_session
 from cloudzy.models import Photo
+from cloudzy.schemas import PhotoDetailResponse,AlbumsResponse,PhotoItem,AlbumItem
+from cloudzy.search_engine import SearchEngine
+from cloudzy.ai_utils import TextSummarizer
+import os
 router = APIRouter(tags=["photos"])
     if not photo:
         raise HTTPException(status_code=404, detail=f"Photo {photo_id} not found")
+    APP_DOMAIN = os.getenv("APP_DOMAIN")
     return PhotoDetailResponse(
         id=photo.id,
         filename=photo.filename,
+        image_url = f"{APP_DOMAIN}uploads/{photo.filename}",
         tags=photo.get_tags(),
         caption=photo.caption,
         embedding=photo.get_embedding(),
     statement = select(Photo).offset(skip).limit(limit)
     photos = session.exec(statement).all()
+    APP_DOMAIN = os.getenv("APP_DOMAIN")
     return [
         PhotoDetailResponse(
             id=photo.id,
             filename=photo.filename,
+            image_url = f"{APP_DOMAIN}uploads/{photo.filename}",
             tags=photo.get_tags(),
             caption=photo.caption,
             embedding=photo.get_embedding(),
             created_at=photo.created_at,
         )
         for photo in photos
+    ]
+@router.get("/albums", response_model=AlbumsResponse)
+async def get_albums(
+    top_k: int = Query(5, ge=2, le=50),
+    session: Session = Depends(get_session),
+):
+    """
+    Create albums of semantically similar photos.
+    """
+    search_engine = SearchEngine()
+    albums_ids = search_engine.create_albums(top_k=top_k)
+    APP_DOMAIN = os.getenv("APP_DOMAIN") or "http://127.0.0.1:8000/"
+    summarizer = TextSummarizer()
+    albums_response = []
+    for album_ids in albums_ids:
+        # Query all photos in this album in one go
+        statement = select(Photo).where(Photo.id.in_(album_ids))
+        photos = session.exec(statement).all()
+        # Build a dict for fast lookup
+        photo_lookup = {photo.id: photo for photo in photos}
+        album_photos = []
+        album_descriptions = []  # Collect captions and tags for summary
+        for pid in album_ids:
+            photo = photo_lookup.get(pid)
+            if not photo:
+                continue
+            # Find distance from FAISS search
+            embedding = photo.get_embedding()
+            if not embedding:
+                continue
+            query_embedding = np.array(embedding).astype(np.float32).reshape(1, -1)
+            distances, ids = search_engine.index.search(query_embedding, top_k)
+            distance_val = next((d for i, d in zip(ids[0], distances[0]) if i == pid), 0.0)
+            album_photos.append(
+                PhotoItem(
+                    photo_id=photo.id,
+                    filename=photo.filename,
+                    image_url=f"{APP_DOMAIN}uploads/{photo.filename}",
+                    tags=photo.get_tags(),
+                    caption=photo.caption,
+                    distance=float(distance_val),
+                )
+            )
+            # Collect descriptions for album summary
+            if photo.caption:
+                album_descriptions.append(photo.caption)
+            tags = photo.get_tags()
+            if tags:
+                album_descriptions.append(" ".join(tags))
+        # Generate album summary from compiled descriptions
+        combined_description = " ".join(album_descriptions)
+        album_summary = summarizer.summarize(combined_description)
+        albums_response.append(
+            AlbumItem(album_summary=album_summary, album=album_photos)
+        )
+    return albums_response

cloudzy/routes/upload.py CHANGED Viewed

@@ -62,6 +62,55 @@ def validate_image_file(filename: str) -> bool:
     """Check if file has valid image extension"""
     return Path(filename).suffix.lower() in ALLOWED_EXTENSIONS
 @router.post("/upload", response_model=UploadResponse)
 async def upload_photo(
@@ -69,18 +118,7 @@ async def upload_photo(
     session: Session = Depends(get_session),
     background_tasks: BackgroundTasks = None,
 ):
-    """
-    Upload a photo and analyze it with AI.
-    - Validates file type
-    - Saves file to disk
-    - Generates tags, caption, and embedding
-    - Stores metadata in database
-    - Indexes embedding in FAISS
-    Returns: Photo metadata with ID
-    """
-    # Validate file
     if not file.filename:
         raise HTTPException(status_code=400, detail="No filename provided")
@@ -90,79 +128,44 @@ async def upload_photo(
             detail=f"Invalid file type. Allowed: {', '.join(ALLOWED_EXTENSIONS)}"
         )
-    # Read file content
     content = await file.read()
     if not content:
         raise HTTPException(status_code=400, detail="Empty file")
-    # Save file to disk
     saved_filename = save_uploaded_file(content, file.filename)
     filepath = f"uploads/{saved_filename}"
     try:
         uploader = ImgBBUploader(expiration=600)
         image_url = uploader.upload(filepath)
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Image upload failed: {str(e)}")
-    try:
-        describer = ImageDescriber()
-        # result = describer.describe_image("https://userx2000-cloudzy-ai-challenge.hf.space/uploads/img_1_20251024_064435_667.jpg")
-        # result = describer.describe_image("https://userx2000-cloudzy-ai-challenge.hf.space/uploads/img_2_20251024_082115_102.jpeg")
-        result = describer.describe_image(image_url)
-    except Exception as e:
-        raise HTTPException(status_code=500, detail=f"Error processing image: {str(e)}")
     APP_DOMAIN = os.getenv("APP_DOMAIN")
-    image_url = f"{APP_DOMAIN}uploads/{saved_filename}"
-    # Generate AI analysis
-    tags = result.get("tags", [])
-    caption = result.get("caption", "")
-    description = result.get("description", "")
-    generator = ImageEmbeddingGenerator()
-    embedding = generator.generate_embedding(tags, description, caption)
-    # np.save("embedding_2.npy", embedding)
-    # embedding = np.load("embedding_2.npy")
-    # Create photo record
     photo = Photo(
         filename=saved_filename,
         filepath=filepath,
-        caption=caption,
     )
-    photo.set_tags(tags)
-    # photo.set_embedding(embedding.tolist())
-    # Save to database
     session.add(photo)
     session.commit()
     session.refresh(photo)
-    # Index in FAISS (in background if needed)
-    search_engine = SearchEngine()
-    search_engine.add_embedding(photo.id, embedding)
     return UploadResponse(
         id=photo.id,
         filename=saved_filename,
-        image_url= image_url,
-        tags=tags,
-        caption=caption,
-        message=f"Photo uploaded successfully with ID {photo.id}"
     )

     """Check if file has valid image extension"""
     return Path(filename).suffix.lower() in ALLOWED_EXTENSIONS
+def process_image_in_background(photo_id: int, filepath: str, image_url: str):
+    """
+    Background task to:
+    - Describe the image
+    - Generate embedding
+    - Update database record
+    - Index embedding in FAISS
+    """
+    from cloudzy.database import SessionLocal
+    from sqlmodel import select
+    try:
+        describer = ImageDescriber()
+        print(f"[Background] Processing image {photo_id}...")
+        result = describer.describe_image(image_url)
+        tags = result.get("tags", [])
+        caption = result.get("caption", "")
+        description = result.get("description", "")
+        generator = ImageEmbeddingGenerator()
+        embedding = generator.generate_embedding(tags, description, caption)
+        # Use a fresh session for background task
+        session = SessionLocal()
+        try:
+            photo = session.exec(select(Photo).where(Photo.id == photo_id)).first()
+            if photo:
+                photo.caption = caption
+                photo.set_tags(tags)
+                photo.set_embedding(embedding.tolist())
+                session.add(photo)
+                session.commit()
+                print(f"[Background] Photo {photo_id} updated with embedding")
+            else:
+                print(f"[Background] Photo {photo_id} not found in database")
+        finally:
+            session.close()
+        # Index in FAISS
+        search_engine = SearchEngine()
+        search_engine.add_embedding(photo_id, embedding)
+        print(f"[Background] Photo {photo_id} indexed in FAISS")
+    except Exception as e:
+        print(f"[Background Task] Error processing image {photo_id}: {e}")
+        import traceback
+        traceback.print_exc()
 @router.post("/upload", response_model=UploadResponse)
 async def upload_photo(
     session: Session = Depends(get_session),
     background_tasks: BackgroundTasks = None,
 ):
+    # --- Validate and save file ---
     if not file.filename:
         raise HTTPException(status_code=400, detail="No filename provided")
             detail=f"Invalid file type. Allowed: {', '.join(ALLOWED_EXTENSIONS)}"
         )
     content = await file.read()
     if not content:
         raise HTTPException(status_code=400, detail="Empty file")
     saved_filename = save_uploaded_file(content, file.filename)
     filepath = f"uploads/{saved_filename}"
     try:
         uploader = ImgBBUploader(expiration=600)
         image_url = uploader.upload(filepath)
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Image upload failed: {str(e)}")
     APP_DOMAIN = os.getenv("APP_DOMAIN")
+    image_local_url = f"{APP_DOMAIN}uploads/{saved_filename}"
+    # --- Save photo immediately with empty caption/tags ---
     photo = Photo(
         filename=saved_filename,
         filepath=filepath,
+        caption="",  # empty for now
     )
     session.add(photo)
     session.commit()
     session.refresh(photo)
+    # --- Schedule background task ---
+    if background_tasks:
+        background_tasks.add_task(
+            process_image_in_background,
+            photo_id=photo.id,
+            filepath=filepath,
+            image_url=image_url
+        )
     return UploadResponse(
         id=photo.id,
         filename=saved_filename,
+        image_url=image_local_url,
+        message=f"Photo uploaded successfully with ID {photo.id}. AI processing is running in the background."
     )

cloudzy/schemas.py CHANGED Viewed

@@ -8,6 +8,7 @@ class PhotoResponse(BaseModel):
     """Response model for photo metadata"""
     id: int
     filename: str
     tags: List[str]
     caption: str
     created_at: datetime
@@ -21,6 +22,7 @@ class PhotoDetailResponse(PhotoResponse):
     embedding: Optional[List[float]] = None
 class SearchResult(BaseModel):
     """Search result with similarity score"""
     photo_id: int
@@ -46,6 +48,21 @@ class UploadResponse(BaseModel):
     id: int
     filename: str
     image_url: str
     tags: List[str]
     caption: str
-    message: str

     """Response model for photo metadata"""
     id: int
     filename: str
+    image_url: str
     tags: List[str]
     caption: str
     created_at: datetime
     embedding: Optional[List[float]] = None
 class SearchResult(BaseModel):
     """Search result with similarity score"""
     photo_id: int
     id: int
     filename: str
     image_url: str
+    # tags: List[str]
+    # caption: str
+    message: str
+class PhotoItem(BaseModel):
+    photo_id: int
+    filename: str
+    image_url: str
     tags: List[str]
     caption: str
+    distance: float
+class AlbumItem(BaseModel):
+    album_summary: str
+    album: List[PhotoItem]
+AlbumsResponse = List[AlbumItem]

cloudzy/search_engine.py CHANGED Viewed

@@ -19,6 +19,72 @@ class SearchEngine:
             base_index = faiss.IndexFlatL2(dim)
             self.index = faiss.IndexIDMap(base_index)
     def add_embedding(self, photo_id: int, embedding: np.ndarray) -> None:
         """
         Add an embedding to the index.

             base_index = faiss.IndexFlatL2(dim)
             self.index = faiss.IndexIDMap(base_index)
+    def create_albums(self, top_k: int = 5, distance_threshold: float = 0.3) -> List[List[int]]:
+        """
+        Group similar images into albums (clusters).
+        For each unvisited photo, finds its top_k most similar photos and creates an album.
+        Photos are marked as visited to avoid duplicate albums.
+        Only includes photos within the distance threshold.
+        Args:
+            top_k: Number of similar images to find for each album
+            distance_threshold: Maximum distance to consider photos as similar (default 0.5)
+        Returns:
+            List of albums, each album is a list of photo_ids
+        """
+        from cloudzy.database import SessionLocal
+        from cloudzy.models import Photo
+        from sqlmodel import select
+        self.load()
+        if self.index.ntotal == 0:
+            return []
+        # Get all photo IDs from FAISS index
+        id_map = self.index.id_map
+        all_ids = [id_map.at(i) for i in range(id_map.size())]
+        visited = set()
+        albums = []
+        for photo_id in all_ids:
+            # Skip if already in an album
+            if photo_id in visited:
+                continue
+            # Get embedding from database
+            session = SessionLocal()
+            try:
+                photo = session.exec(select(Photo).where(Photo.id == photo_id)).first()
+                if not photo:
+                    continue
+                embedding = photo.get_embedding()
+                if not embedding:
+                    continue
+                # Search for similar images
+                query_embedding = np.array(embedding).reshape(1, -1).astype(np.float32)
+                distances, ids = self.index.search(query_embedding, top_k)
+                # Build album: collect similar photos that haven't been visited and are within threshold
+                album = []
+                for pid, distance in zip(ids[0], distances[0]):
+                    if pid != -1 and pid not in visited and distance <= distance_threshold:
+                        album.append(int(pid))
+                        visited.add(pid)
+                # Add album if it has at least 1 photo
+                if album:
+                    albums.append(album)
+            finally:
+                session.close()
+        return albums
     def add_embedding(self, photo_id: int, embedding: np.ndarray) -> None:
         """
         Add an embedding to the index.

cloudzy/utils/file_upload_service.py CHANGED Viewed

@@ -51,7 +51,6 @@ class ImgBBUploader:
             )
             resp.raise_for_status()
             data = resp.json()
-            print(data)
             if data.get("success"):
                 return data["data"]["url"]
             raise RuntimeError(f"Upload failed: {data}")

             )
             resp.raise_for_status()
             data = resp.json()
             if data.get("success"):
                 return data["data"]["url"]
             raise RuntimeError(f"Upload failed: {data}")