A newer version of the Streamlit SDK is available:
1.51.0
title: Zero-Shot Text Classifier
emoji: ๐ฏ
colorFrom: purple
colorTo: indigo
sdk: streamlit
app_file: app.py
pinned: false
๐ฏ Zero-Shot Text Classifier
An interactive web application that classifies input text into user-defined categories using Hugging Face's zero-shot classification models via the Inference API.
Live Demo: HF Zero-Shot Text Classifier
Repo: GitHub - hf-zero-shot-classifier
๐ Overview
This application provides a simple interface for users to:
- Input any piece of text.
- Provide a comma-separated list of candidate labels (categories).
- Receive a ranked list of these labels based on how well they apply to the input text, as determined by a zero-shot classification model.
The primary goal is to demonstrate the ability to leverage powerful, general-purpose NLP models for flexible text categorization without needing to train a model on specific categories beforehand.
๐ฏ Problem Solved
Traditional text classification often requires training a model on a predefined set of categories with labeled data. This can be time-consuming and inflexible if categories change or new ones emerge. Zero-shot classification addresses this by allowing classification against arbitrary labels at inference time. This tool showcases a practical application of this capability, offering on-the-fly text categorization.
โจ Skills Showcased
- AI/ML Implementation: Utilizing pre-trained NLP models for zero-shot classification.
- Python: Core programming language for backend logic and API interaction.
- ML Libraries (Conceptual): Understanding the role and use of Hugging Face Transformers (even if used via API).
- API Integration: Connecting to and consuming the Hugging Face Inference API.
- Data Handling: Sending text and label data to the API and parsing JSON responses.
- NLP (using APIs): Practical application of Natural Language Processing for dynamic text classification.
- Web Development (UI): Building an interactive user interface with Streamlit.
- Environment Management: Use of
.envfor API keys. - Version Control: Git and GitHub for project management.
- Deployment: Deploying the application to Hugging Face Spaces.
- Documentation: Creating clear and concise project documentation (this README).
๐ ๏ธ How It Works
- User Input:
- The user types or pastes the text they want to classify into a text area.
- The user provides a comma-separated list of candidate labels (e.g., "technology, sports, finance").
- API Call Preparation: When the "Classify Text" button is clicked:
- The Python backend (using the
requestslibrary) prepares a payload. This payload includes the input text and the list of candidate labels.
- The Python backend (using the
- Hugging Face API Interaction:
- A POST request is made to the Hugging Face Inference API endpoint for a selected zero-shot classification model (e.g.,
facebook/bart-large-mnliorvalhalla/distilbart-mnli-12-3). - The Hugging Face API token (loaded securely from environment variables) is included in the request headers for authentication.
- A POST request is made to the Hugging Face Inference API endpoint for a selected zero-shot classification model (e.g.,
- Zero-Shot Classification: The Hugging Face model processes the input text against the provided candidate labels, calculating a relevance score for each label. It does this without having been explicitly trained on these specific labels beforehand.
- Response Handling: The application receives the API's JSON response, which typically contains the input sequence, the list of labels, and a corresponding list of scores, often sorted by relevance.
- Display Output: The labels and their scores are extracted from the response and displayed to the user in the Streamlit interface, ranked by score. Error handling is implemented for API issues or unexpected responses.
๐ป Technologies Used
- Programming Language: Python 3.x
- AI Models/API:
- Hugging Face Hub
- Hugging Face Inference API (Free Tier)
- Zero-Shot Classification Models (e.g.,
facebook/bart-large-mnli,valhalla/distilbart-mnli-12-3)
- Python Libraries:
streamlit: For building the web application UI.requests: For making HTTP requests to the Hugging Face API.python-dotenv: For managing environment variables (like the API token) locally.
- Version Control: Git & GitHub
- Deployment: Hugging Face Spaces
- Development Environment: Visual Studio Code (or your preferred IDE), Python Virtual Environment (
venvorconda)
๐ Setup and Local Development
To run this project locally, follow these steps:
Clone the repository:
git clone https://github.com/[Your GitHub Username]/hf-zero-shot-classifier.git cd hf-zero-shot-classifierSet up a Python virtual environment: (Assuming you have a shared
venvorcondaenvironment in a parentai-portfoliodirectory as per the overall plan)# From within hf-zero-shot-classifier directory: # Example for venv: # For macOS/Linux: source ../venv/bin/activate # For Windows (Git Bash or PowerShell): # source ../venv/Scripts/activate # For Windows (Command Prompt): # ..\venv\Scripts\activate # Example for conda (if your shared env is named 'ai_env'): # conda activate ai_envIf you don't have the shared environment or prefer a dedicated one:
python -m venv venv # Or: conda create -n hf_zero_shot_env python=3.9 # Activate it: # macOS/Linux: source venv/bin/activate # Windows: venv\Scripts\activate # Conda: conda activate hf_zero_shot_envInstall dependencies:
pip install -r requirements.txtSet up your Hugging Face API Token:
- Ensure you have a
.envfile in the root of your mainai-portfolioproject directory (i.e., one level above thishf-zero-shot-classifierproject). - Add your Hugging Face API token to the
.envfile:HUGGING_FACE_API_TOKEN="your_hf_api_token_here" - Note: The
app.pyis configured to look for.envin the parent directory. If your.envfile is elsewhere, you might need to adjust theload_dotenv()path inapp.py.
- Ensure you have a
Run the Streamlit application:
streamlit run app.pyThe application should open in your web browser.
๐ผ๏ธ Screenshot
๐ฎ Future Enhancements (Optional)
- Multi-label classification option: Allow the model to predict multiple relevant labels if appropriate (some models support this via parameters).
- Adjustable threshold: Allow users to set a confidence threshold for displayed labels.
- Model selection: Allow users to choose from a few different zero-shot classification models.
- Example use cases: Provide example texts and label sets to demonstrate capabilities.
๐ Acknowledgements
- The Hugging Face team for their incredible models, Inference API, and Spaces platform.
- The developers of Streamlit for making web app creation in Python so accessible.