File size: 2,690 Bytes
1c4aa4d 82436f1 6b9d358 cdf9c01 1c4aa4d d67bc6d 1c4aa4d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
tags:
- vector-database
- benchmarks
- faiss
- weaviate
- chroma
- multimodal
- clip
- retrieval
license: apache-2.0
---
# Vector Database Benchmarks: FAISS vs Chroma vs Weaviate
This repository contains experiments benchmarking popular vector databases on **multimodal embeddings** generated from the [Flickr8k dataset](https://huggingface.co/datasets/jxie/flickr8k).
We focused on four key evaluation dimensions:
1. **Latency per query**
2. **Recall@5 vs Flat (accuracy tradeoffs)**
3. **Queries per second (QPS throughput)**
4. **Ingestion scaling performance**
All experiments were run on **Google Colab** (T4 GPU for embedding generation, CPU backend for databases).
---
## Methodology
- Dataset: 6k images and 30k captions from Flickr8k.
- Embeddings: CLIP (OpenAI ViT-B/32).
- Workload: Caption-to-image retrieval (cross-modal).
- Baseline: FAISS Flat index used as the ground-truth for recall calculations.
Each vector database was tested under the same conditions for ingestion, search, and recall.
---
## Results Summary
| Metric | FAISS | Chroma | Weaviate |
|--------------------------|------------------|------------------|------------------|
| **Avg Latency per Query** | 0.19 ms | 0.76 ms | 1.82 ms |
| **Recall@5 (Flat Baseline)** | 1.00 | 0.002 | 0.918 |
| **QPS Throughput** | 1929.94 | 719.01 | 598.40 |
| **Ingestion Scaling (20k)** | 0.024s | 2.806s | 4.000s |

---
## Key Takeaways
- **FAISS** is fastest, leveraging in-memory array ingestion and customizable indexing strategies.
- **Chroma** offers simplicity and ease of integration but struggles at scale due to batching and internal constraints.
- **Weaviate** provides a more feature-rich ecosystem (schema, hybrid search, persistence) but at higher ingestion and query overhead.
At the million-vector scale, speed alone will not decide your choice; **engineering tradeoffs, developer productivity, and system features** will.
Benchmarks tell one part of the story, your use case tells the rest.
---
## Usage
You can reproduce these experiments using the provided notebook and Hugging Face dataset.
See full code here: [rag-experiments/VectorDB-Benchmarks](https://huggingface.co/rag-experiments/VectorDB-Benchmarks).
Dataset used: Flickr8k (train split — 6k images, 30k captions, multimodal — images and text), CLIP Embeddings. Dataset Author: Johnathan Xie
---
## Citation
If you find this useful, please cite this repository:
|