rag-experiments
/

VectorDB-Benchmarks

vector-database

Model card Files Files and versions

VectorDB-Benchmarks / README.md

GenAIDevTOProd's picture

Update README.md

d67bc6d verified 2 months ago

|

history blame contribute delete

2.69 kB

	---
	tags:
	- vector-database
	- benchmarks
	- faiss
	- weaviate
	- chroma
	- multimodal
	- clip
	- retrieval
	license: apache-2.0
	---

	# Vector Database Benchmarks: FAISS vs Chroma vs Weaviate

	This repository contains experiments benchmarking popular vector databases on multimodal embeddings generated from the [Flickr8k dataset](https://huggingface.co/datasets/jxie/flickr8k).
	We focused on four key evaluation dimensions:

	1. Latency per query
	2. Recall@5 vs Flat (accuracy tradeoffs)
	3. Queries per second (QPS throughput)
	4. Ingestion scaling performance

	All experiments were run on Google Colab (T4 GPU for embedding generation, CPU backend for databases).

	---

	## Methodology

	- Dataset: 6k images and 30k captions from Flickr8k.
	- Embeddings: CLIP (OpenAI ViT-B/32).
	- Workload: Caption-to-image retrieval (cross-modal).
	- Baseline: FAISS Flat index used as the ground-truth for recall calculations.

	Each vector database was tested under the same conditions for ingestion, search, and recall.

	---

	## Results Summary

	\| Metric \| FAISS \| Chroma \| Weaviate \|
	\|--------------------------\|------------------\|------------------\|------------------\|
	\| Avg Latency per Query \| 0.19 ms \| 0.76 ms \| 1.82 ms \|
	\| Recall@5 (Flat Baseline) \| 1.00 \| 0.002 \| 0.918 \|
	\| QPS Throughput \| 1929.94 \| 719.01 \| 598.40 \|
	\| Ingestion Scaling (20k) \| 0.024s \| 2.806s \| 4.000s \|


	![Vector DB Comparison](./vectordb_metrics.png)

	---

	## Key Takeaways

	- FAISS is fastest, leveraging in-memory array ingestion and customizable indexing strategies.
	- Chroma offers simplicity and ease of integration but struggles at scale due to batching and internal constraints.
	- Weaviate provides a more feature-rich ecosystem (schema, hybrid search, persistence) but at higher ingestion and query overhead.

	At the million-vector scale, speed alone will not decide your choice; engineering tradeoffs, developer productivity, and system features will.
	Benchmarks tell one part of the story, your use case tells the rest.

	---

	## Usage

	You can reproduce these experiments using the provided notebook and Hugging Face dataset.
	See full code here: [rag-experiments/VectorDB-Benchmarks](https://huggingface.co/rag-experiments/VectorDB-Benchmarks).
	Dataset used: Flickr8k (train split — 6k images, 30k captions, multimodal — images and text), CLIP Embeddings. Dataset Author: Johnathan Xie

	---

	## Citation

	If you find this useful, please cite this repository: