Spaces:
Sleeping
Sleeping
| title: Tox21 SNN Classifier | |
| emoji: 🌖 | |
| colorFrom: green | |
| colorTo: pink | |
| sdk: docker | |
| pinned: false | |
| license: cc-by-nc-4.0 | |
| short_description: Self-Normalizing Neural Network Baseline for Tox21 | |
| # Tox21 SNN Classifier | |
| This repository hosts a Hugging Face Space that provides an examplary API for submitting models to the [Tox21 Leaderboard](https://huggingface.co/spaces/tschouis/tox21_leaderboard). | |
| In this example, we train a XGBoost classifier on the Tox21 targets and save the trained model in the `assets/` folder. | |
| **Important:** For leaderboard submission, your Space does not need to include training code. It only needs to implement inference in the `predict()` function inside `predict.py`. The `predict()` function must keep the provided skeleton: it should take a list of SMILES strings as input and return a prediction dictionary as output, with SMILES and targets as keys. Therefore, any preprocessing of SMILES strings must be executed on-the-fly during inference. | |
| # Repository Structure | |
| - `predict.py` - Defines the `predict()` function required by the leaderboard (entry point for inference). | |
| - `app.py` - FastAPI application wrapper (can be used as-is). | |
| - `src/` - Core model & preprocessing logic: | |
| - `data.py` - SMILES preprocessing pipeline | |
| - `model.py` - XGBoost classifier wrapper | |
| - `train.py` - Script to train the classifier | |
| - `utils.py` – Constants and Helper functions | |
| # Quickstart with Spaces | |
| You can easily adapt this project in your own Hugging Face account: | |
| - Open this Space on Hugging Face. | |
| - Click "Duplicate this Space" (top-right corner). | |
| - Modify `src/` for your preprocessing pipeline and model class | |
| - Modify `predict()` inside `predict.py` to perform model inference while keeping the function skeleton unchanged to remain compatible with the leaderboard. | |
| That’s it, your model will be available as an API endpoint for the Tox21 Leaderboard. | |
| # Installation | |
| To run (and train) the XGBoost, clone the repository and install dependencies: | |
| ```bash | |
| git clone https://huggingface.co/spaces/tschouis/tox21_snn_classifier | |
| cd tox21_snn_classifier | |
| conda create -n tox21_snn_cls python=3.11 | |
| conda activate tox21_snn_cls | |
| pip install -r requirements.txt | |
| ``` | |
| # Training | |
| To train the XGBoost model from scratch: | |
| ```bash | |
| python -m src/train.py | |
| ``` | |
| This will: | |
| 1. Load and preprocess the Tox21 training dataset. | |
| 2. Train a XGBoost classifier. | |
| 3. Save the trained model to the assets/ folder. | |
| 4. Evaluate the trained XGBoost classifier on the validation split. | |
| # Inference | |
| For inference, you only need `predict.py`. | |
| Example usage inside Python: | |
| ```python | |
| from predict import predict | |
| smiles_list = ["CCO", "c1ccccc1", "CC(=O)O"] | |
| results = predict(smiles_list) | |
| print(results) | |
| ``` | |
| The output will be a nested dictionary in the format: | |
| ```python | |
| { | |
| "CCO": {"target1": 0, "target2": 1, ..., "target12": 0}, | |
| "c1ccccc1": {"target1": 1, "target2": 0, ..., "target12": 1}, | |
| "CC(=O)O": {"target1": 0, "target2": 0, ..., "target12": 0} | |
| } | |
| ``` | |
| # Notes | |
| - Only adapting `predict.py` for your model inference is required for leaderboard submission. | |
| - Training (`src/train.py`) is provided for reproducibility. | |
| - Preprocessing (here inside `src/data.py`) must be applied at inference time, not just training. | |