Spaces:

tschouis
/

tox21_snn_classifier

Sleeping

App Files Files Community

tox21_snn_classifier / README.md

antoniaebner

add code, specific requirements

c70bdf2 9 days ago

preview code

raw

history blame contribute delete

3.28 kB

	---
	title: Tox21 SNN Classifier
	emoji: 🌖
	colorFrom: green
	colorTo: pink
	sdk: docker
	pinned: false
	license: cc-by-nc-4.0
	short_description: Self-Normalizing Neural Network Baseline for Tox21
	---

	# Tox21 SNN Classifier

	This repository hosts a Hugging Face Space that provides an examplary API for submitting models to the [Tox21 Leaderboard](https://huggingface.co/spaces/tschouis/tox21_leaderboard).

	In this example, we train a XGBoost classifier on the Tox21 targets and save the trained model in the `assets/` folder.

	Important: For leaderboard submission, your Space does not need to include training code. It only needs to implement inference in the `predict()` function inside `predict.py`. The `predict()` function must keep the provided skeleton: it should take a list of SMILES strings as input and return a prediction dictionary as output, with SMILES and targets as keys. Therefore, any preprocessing of SMILES strings must be executed on-the-fly during inference.

	# Repository Structure
	- `predict.py` - Defines the `predict()` function required by the leaderboard (entry point for inference).
	- `app.py` - FastAPI application wrapper (can be used as-is).

	- `src/` - Core model & preprocessing logic:
	- `data.py` - SMILES preprocessing pipeline
	- `model.py` - XGBoost classifier wrapper
	- `train.py` - Script to train the classifier
	- `utils.py` – Constants and Helper functions

	# Quickstart with Spaces

	You can easily adapt this project in your own Hugging Face account:

	- Open this Space on Hugging Face.

	- Click "Duplicate this Space" (top-right corner).

	- Modify `src/` for your preprocessing pipeline and model class

	- Modify `predict()` inside `predict.py` to perform model inference while keeping the function skeleton unchanged to remain compatible with the leaderboard.

	That’s it, your model will be available as an API endpoint for the Tox21 Leaderboard.

	# Installation
	To run (and train) the XGBoost, clone the repository and install dependencies:

	```bash
	git clone https://huggingface.co/spaces/tschouis/tox21_snn_classifier
	cd tox21_snn_classifier

	conda create -n tox21_snn_cls python=3.11
	conda activate tox21_snn_cls
	pip install -r requirements.txt
	```

	# Training

	To train the XGBoost model from scratch:

	```bash
	python -m src/train.py
	```

	This will:

	1. Load and preprocess the Tox21 training dataset.
	2. Train a XGBoost classifier.
	3. Save the trained model to the assets/ folder.
	4. Evaluate the trained XGBoost classifier on the validation split.


	# Inference

	For inference, you only need `predict.py`.

	Example usage inside Python:

	```python
	from predict import predict

	smiles_list = ["CCO", "c1ccccc1", "CC(=O)O"]
	results = predict(smiles_list)

	print(results)
	```

	The output will be a nested dictionary in the format:

	```python
	{
	"CCO": {"target1": 0, "target2": 1, ..., "target12": 0},
	"c1ccccc1": {"target1": 1, "target2": 0, ..., "target12": 1},
	"CC(=O)O": {"target1": 0, "target2": 0, ..., "target12": 0}
	}
	```

	# Notes

	- Only adapting `predict.py` for your model inference is required for leaderboard submission.

	- Training (`src/train.py`) is provided for reproducibility.

	- Preprocessing (here inside `src/data.py`) must be applied at inference time, not just training.