File size: 1,598 Bytes
71d0757
 
 
 
 
 
8435783
71d0757
 
 
 
461dfcb
829cfaa
 
 
 
 
 
 
 
 
 
 
 
461dfcb
829cfaa
 
 
 
 
 
29a733e
829cfaa
 
 
461dfcb
 
 
 
829cfaa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29a733e
 
829cfaa
8435783
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
title: SBERT + FAISS Semantic Search
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
---

# SBERT + FAISS Semantic Search + Evaluation Metrics

This Hugging Face Space hosts a **semantic search system** built with:

- [Sentence-BERT (SBERT)](https://www.sbert.net/) for embeddings  
- [FAISS](https://faiss.ai/) for fast vector search  
- [MS MARCO v1.1 dataset](https://microsoft.github.io/msmarco/) (10,000 passages subset)  
- [Gradio](https://gradio.app/) for the interactive interface  

---

## πŸ”Ή Features
- Enter a **query** to retrieve the **Top-10 most similar passages**.  
- Computes **true IR metrics** when the query matches one in MS MARCO validation set:
  - Precision@10  
  - Recall@10  
  - F1-score  
  - Mean Reciprocal Rank (MRR)  
  - Normalized Discounted Cumulative Gain (nDCG@10)  

---

## πŸ”Ή How to Use
1. Type a query into the input box.  
2. Press **Submit**.  
3. View:  
   - **Top-10 retrieved passages** with similarity scores  
   - **Evaluation metrics** if the query exists in the validation set  

---

## πŸ”Ή Tech Stack
- **Embeddings:** `sentence-transformers/all-mpnet-base-v2`  
- **Indexing:** FAISS (L2 similarity)  
- **Dataset:** MS MARCO v1.1 (first 10,000 passages)  
- **Interface:** Gradio  

---

## πŸ”Ή Citation
If you use this system in research, please cite:

- [Sentence-BERT](https://arxiv.org/abs/1908.10084)  
- [MS MARCO](https://microsoft.github.io/msmarco/)  

---

## πŸ”Ή Author
Built for a research project on **user-centered evaluation of semantic search systems**.