Improve model card: add metadata, project page, paper abstract, and update paper link (#1)
Browse files- Improve model card: add metadata, project page, paper abstract, and update paper link (ce7daed84befa9661c409e0c9c94d2288a3ca4a8)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -1,24 +1,45 @@
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
- stabilityai/stable-diffusion-3.5-medium
|
| 4 |
-
library_name:
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Model Card
|
| 8 |
|
| 9 |
-
|
| 10 |
## Model Details
|
| 11 |
|
| 12 |
### Model Description
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
### Model Sources
|
| 17 |
|
| 18 |
<!-- Provide the basic links for the model. -->
|
| 19 |
|
| 20 |
- **Repository:** https://github.com/NVlabs/DiffusionNFT
|
| 21 |
-
- **Paper:**
|
|
|
|
| 22 |
|
| 23 |
## Uses
|
| 24 |
|
|
|
|
| 1 |
---
|
| 2 |
base_model:
|
| 3 |
- stabilityai/stable-diffusion-3.5-medium
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
pipeline_tag: text-to-image
|
| 6 |
---
|
| 7 |
|
| 8 |
# Model Card
|
| 9 |
|
|
|
|
| 10 |
## Model Details
|
| 11 |
|
| 12 |
### Model Description
|
| 13 |
+
This is a reproduced LoRA of SD3.5-Medium, post-trained with DiffusionNFT on multiple reward models, as presented in the paper [Diffusion Negative-aware FineTuning (DiffusionNFT)](https://huggingface.co/papers/2509.16117).
|
| 14 |
+
|
| 15 |
+
### Paper Abstract
|
| 16 |
+
Online reinforcement learning (RL) has been central to post-training language
|
| 17 |
+
models, but its extension to diffusion models remains challenging due to
|
| 18 |
+
intractable likelihoods. Recent works discretize the reverse sampling process
|
| 19 |
+
to enable GRPO-style training, yet they inherit fundamental drawbacks,
|
| 20 |
+
including solver restrictions, forward-reverse inconsistency, and complicated
|
| 21 |
+
integration with classifier-free guidance (CFG). We introduce Diffusion
|
| 22 |
+
Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that
|
| 23 |
+
optimizes diffusion models directly on the forward process via flow matching.
|
| 24 |
+
DiffusionNFT contrasts positive and negative generations to define an implicit
|
| 25 |
+
policy improvement direction, naturally incorporating reinforcement signals
|
| 26 |
+
into the supervised learning objective. This formulation enables training with
|
| 27 |
+
arbitrary black-box solvers, eliminates the need for likelihood estimation, and
|
| 28 |
+
requires only clean images rather than sampling trajectories for policy
|
| 29 |
+
optimization. DiffusionNFT is up to 25times more efficient than FlowGRPO in
|
| 30 |
+
head-to-head comparisons, while being CFG-free. For instance, DiffusionNFT
|
| 31 |
+
improves the GenEval score from 0.24 to 0.98 within 1k steps, while FlowGRPO
|
| 32 |
+
achieves 0.95 with over 5k steps and additional CFG employment. By leveraging
|
| 33 |
+
multiple reward models, DiffusionNFT significantly boosts the performance of
|
| 34 |
+
SD3.5-Medium in every benchmark tested.
|
| 35 |
|
| 36 |
### Model Sources
|
| 37 |
|
| 38 |
<!-- Provide the basic links for the model. -->
|
| 39 |
|
| 40 |
- **Repository:** https://github.com/NVlabs/DiffusionNFT
|
| 41 |
+
- **Paper:** https://huggingface.co/papers/2509.16117
|
| 42 |
+
- **Project Page:** https://research.nvidia.com/labs/dir/DiffusionNFT
|
| 43 |
|
| 44 |
## Uses
|
| 45 |
|