antoniaebner's picture
add code
9af3c0c
|
raw
history blame
3.29 kB
metadata
title: Tox21 SNN Classifier
emoji: 🌖
colorFrom: green
colorTo: pink
sdk: docker
pinned: false
license: apache-2.0
short_description: Self-Normalizing Neural Network Baseline for Tox21

Tox21 XGBoost Classifier

This repository hosts a Hugging Face Space that provides an examplary API for submitting models to the Tox21 Leaderboard.

In this example, we train a XGBoost classifier on the Tox21 targets and save the trained model in the assets/ folder.

Important: For leaderboard submission, your Space does not need to include training code. It only needs to implement inference in the predict() function inside predict.py. The predict() function must keep the provided skeleton: it should take a list of SMILES strings as input and return a prediction dictionary as output, with SMILES and targets as keys. Therefore, any preprocessing of SMILES strings must be executed on-the-fly during inference.

Repository Structure

  • predict.py - Defines the predict() function required by the leaderboard (entry point for inference).

  • app.py - FastAPI application wrapper (can be used as-is).

  • src/ - Core model & preprocessing logic:

    • data.py - SMILES preprocessing pipeline
    • model.py - XGBoost classifier wrapper
    • train.py - Script to train the classifier
    • utils.py – Constants and Helper functions

Quickstart with Spaces

You can easily adapt this project in your own Hugging Face account:

  • Open this Space on Hugging Face.

  • Click "Duplicate this Space" (top-right corner).

  • Modify src/ for your preprocessing pipeline and model class

  • Modify predict() inside predict.py to perform model inference while keeping the function skeleton unchanged to remain compatible with the leaderboard.

That’s it, your model will be available as an API endpoint for the Tox21 Leaderboard.

Installation

To run (and train) the XGBoost, clone the repository and install dependencies:

git clone https://huggingface.co/spaces/tschouis/tox21_xgboost_classifier
cd tox_21_xgb_classifier

conda create -n tox21_xgb_cls python=3.11
conda activate tox21_xgb_cls
pip install -r requirements.txt

Training

To train the XGBoost model from scratch:

python -m src/train.py

This will:

  1. Load and preprocess the Tox21 training dataset.
  2. Train a XGBoost classifier.
  3. Save the trained model to the assets/ folder.
  4. Evaluate the trained XGBoost classifier on the validation split.

Inference

For inference, you only need predict.py.

Example usage inside Python:

from predict import predict

smiles_list = ["CCO", "c1ccccc1", "CC(=O)O"]
results = predict(smiles_list)

print(results)

The output will be a nested dictionary in the format:

{
    "CCO": {"target1": 0, "target2": 1, ..., "target12": 0},
    "c1ccccc1": {"target1": 1, "target2": 0, ..., "target12": 1},
    "CC(=O)O": {"target1": 0, "target2": 0, ..., "target12": 0}
}

Notes

  • Only adapting predict.py for your model inference is required for leaderboard submission.

  • Training (src/train.py) is provided for reproducibility.

  • Preprocessing (here inside src/data.py) must be applied at inference time, not just training.