liuganghuggingface
/

DemoDiff-0.7B

@@ -1,12 +1,27 @@
 ---
-license: mit
 datasets:
 - liuganghuggingface/demodiff_downstream
 tags:
 - chemistry
 - biology
 ---
 ### Model Configuration
 | Parameter | Value | Description |
@@ -20,4 +35,4 @@ tags:
 | **task_name** | `pretrain` | Task type for model training. |
 | **tokenizer_name** | `pretrain` | Tokenizer used for model input. |
 | **vocab_ring_len** | 300 | Length of the circular vocabulary window. |
-| **vocab_size** | 3000 | Total vocabulary size. |

 ---
 datasets:
 - liuganghuggingface/demodiff_downstream
+license: mit
 tags:
 - chemistry
 - biology
+pipeline_tag: graph-ml
 ---
+# DemoDiff: Graph Diffusion Transformers are In-Context Molecular Designers
+This repository contains the DemoDiff model, a diffusion-based molecular foundation model for **in-context inverse molecular design**, as presented in the paper [Graph Diffusion Transformers are In-Context Molecular Designers](https://huggingface.co/papers/2510.08744).
+DemoDiff leverages graph diffusion transformers to generate molecules based on contextual examples, enabling few-shot molecular design across diverse chemical tasks without task-specific fine-tuning. It introduces demonstration-conditioned diffusion models, which define task contexts using a small set of molecule-score examples instead of text descriptions to guide a denoising Transformer for molecule generation. A novel molecular tokenizer with Node Pair Encoding is developed for scalable pretraining, representing molecules at the motif level.
+Code: https://github.com/liugangcode/DemoDiff
+## 🌟 Key Features
+- **In-Context Learning**: Generate molecules using only contextual examples (no fine-tuning required)
+- **Graph-Based Tokenization**: Novel molecular graph tokenization with BPE-style vocabulary
+- **Comprehensive Benchmarks**: 30+ downstream tasks covering drug discovery, docking, and polymer design
 ### Model Configuration
 | Parameter | Value | Description |
 | **task_name** | `pretrain` | Task type for model training. |
 | **tokenizer_name** | `pretrain` | Tokenizer used for model input. |
 | **vocab_ring_len** | 300 | Length of the circular vocabulary window. |
+| **vocab_size** | 3000 | Total vocabulary size. |