Spaces:
Sleeping
Sleeping
File size: 4,350 Bytes
101ce13 f484830 101ce13 25fddff ec6833b 101ce13 1241a9f f484830 25fddff f484830 1241a9f f484830 25fddff 1ce331f 25fddff 1ce331f f484830 1ce331f f484830 1ce331f f484830 1ce331f f484830 1ce331f f484830 25fddff f484830 54e89de f484830 1ce331f f484830 1ce331f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
title: Tox21 GIN Classifier
emoji: 🤖
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: cc-by-nc-4.0
short_description: Graph Isomorphism Network Baseline Classifier for Tox21
---
# Tox21 Graph Isomorphism Network (GIN) Classifier
This repository hosts a Hugging Face Space that provides an examplary API for submitting models to the [Tox21 Leaderboard](https://huggingface.co/spaces/ml-jku/tox21_leaderboard).
Here a [Graph Isomorphism Network(GIN)](https://arxiv.org/abs/1810.00826) is trained on the Tox21 dataset, and the trained models are provided for
inference. Model input is a SMILES string of the small molecule, and the output are 12 numeric values for
each of the toxic effects of the Tox21 dataset.
**Important:** For leaderboard submission, your Space needs to include training code. The file `train.py` should train the model using the config specified inside the `config/` folder and save the final model parameters into a file inside the `checkpoints/` folder. The model should be trained using the [Tox21_dataset](https://huggingface.co/datasets/ml-jku/tox21) provided on Hugging Face. The datasets can be loaded like this:
```python
from datasets import load_dataset
ds = load_dataset("ml-jku/tox21", token=token)
train_df = ds["train"].to_pandas()
val_df = ds["validation"].to_pandas()
```
Additionally, the Space needs to implement inference in the `predict()` function inside `predict.py`. The `predict()` function must keep the provided skeleton: it should take a list of SMILES strings as input and return a nested prediction dictionary as output, with SMILES as keys and dictionaries containing targetname-prediction pairs as values. Therefore, any preprocessing of SMILES strings must be executed on-the-fly during inference.
# Repository Structure
- `predict.py` - Defines the `predict()` function required by the leaderboard (entry point for inference).
- `app.py` - FastAPI application wrapper (can be used as-is).
- `train.py` - trains and saves a model using the config in the `config/` folder.
- `config/` - the config file used by `train.py`.
- `checkpoints/` - the saved model that is used in `predict.py` is here.
- `src/` - Core model & preprocessing logic:
- `preprocess.py` - SMILES preprocessing pipeline and dataset creation
- `train_evaluate.py` - train and evaluate model, compute metrics
- `seed.py` - set seed for everything
- `model.py` - contains the model class
# Quickstart with Spaces
You can easily adapt this project in your own Hugging Face account:
- Open this Space on Hugging Face.
- Click "Duplicate this Space" (top-right corner).
- Create a `.env` according to `.example.env`.
- Modify `src/` for your preprocessing pipeline and model class
- Modify `predict()` inside `predict.py` to perform model inference while keeping the function skeleton unchanged to remain compatible with the leaderboard.
- Modify `train.py` according to your model and preprocessing pipeline.
- Modify the file inside `config/` to contain all hyperparameters that are set in `train.py`.
That’s it, your model will be available as an API endpoint for the Tox21 Leaderboard.
# Installation
To run the GIN classifier, clone the repository and install dependencies:
```bash
git clone https://huggingface.co/spaces/ml-jku/tox21_gin_classifier
cd tox21_gin_classifier
pip install -r requirements.txt
```
# Training
To train the GIN model from scratch, run:
```bash
python train.py
```
These commands will:
1. Load and preprocess the Tox21 training dataset
2. Train a GIN classifier
3. Store the resulting model in the `checkpoints/` directory.
# Inference
For inference, you only need `predict.py`.
Example usage inside Python:
```python
from predict import predict
smiles_list = ["CCO", "c1ccccc1", "CC(=O)O"]
results = predict(smiles_list)
print(results)
```
The output will be a nested dictionary in the format:
```python
{
"CCO": {"target1": 0, "target2": 1, ..., "target12": 0},
"c1ccccc1": {"target1": 1, "target2": 0, ..., "target12": 1},
"CC(=O)O": {"target1": 0, "target2": 0, ..., "target12": 0}
}
```
# Notes
- Adapting `predict.py`, `train.py`, `config/`, and `checkpoints/` is required for leaderboard submission.
- Preprocessing (here inside `src/preprocess.py`) must be done inside `predict.py` not just `train.py`.
|