Improve model card: add metadata, project page, paper abstract, and update paper link (#1)

Browse files

- Improve model card: add metadata, project page, paper abstract, and update paper link (ce7daed84befa9661c409e0c9c94d2288a3ca4a8)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +26 -5

README.md CHANGED Viewed

@@ -1,24 +1,45 @@
 ---
 base_model:
 - stabilityai/stable-diffusion-3.5-medium
-library_name: peft
 ---
 # Model Card
 ## Model Details
 ### Model Description
-This is a reproduced LoRA of SD3.5-Medium, post-trained with DiffusionNFT on multiple reward models.
 ### Model Sources
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/NVlabs/DiffusionNFT
-- **Paper:** http://arxiv.org/abs/2509.16117
 ## Uses

 ---
 base_model:
 - stabilityai/stable-diffusion-3.5-medium
+library_name: diffusers
+pipeline_tag: text-to-image
 ---
 # Model Card
 ## Model Details
 ### Model Description
+This is a reproduced LoRA of SD3.5-Medium, post-trained with DiffusionNFT on multiple reward models, as presented in the paper [Diffusion Negative-aware FineTuning (DiffusionNFT)](https://huggingface.co/papers/2509.16117).
+### Paper Abstract
+Online reinforcement learning (RL) has been central to post-training language
+models, but its extension to diffusion models remains challenging due to
+intractable likelihoods. Recent works discretize the reverse sampling process
+to enable GRPO-style training, yet they inherit fundamental drawbacks,
+including solver restrictions, forward-reverse inconsistency, and complicated
+integration with classifier-free guidance (CFG). We introduce Diffusion
+Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that
+optimizes diffusion models directly on the forward process via flow matching.
+DiffusionNFT contrasts positive and negative generations to define an implicit
+policy improvement direction, naturally incorporating reinforcement signals
+into the supervised learning objective. This formulation enables training with
+arbitrary black-box solvers, eliminates the need for likelihood estimation, and
+requires only clean images rather than sampling trajectories for policy
+optimization. DiffusionNFT is up to 25times more efficient than FlowGRPO in
+head-to-head comparisons, while being CFG-free. For instance, DiffusionNFT
+improves the GenEval score from 0.24 to 0.98 within 1k steps, while FlowGRPO
+achieves 0.95 with over 5k steps and additional CFG employment. By leveraging
+multiple reward models, DiffusionNFT significantly boosts the performance of
+SD3.5-Medium in every benchmark tested.
 ### Model Sources
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/NVlabs/DiffusionNFT
+- **Paper:** https://huggingface.co/papers/2509.16117
+- **Project Page:** https://research.nvidia.com/labs/dir/DiffusionNFT
 ## Uses