worstcoder nielsr HF Staff commited on
Commit
9cd7494
·
verified ·
1 Parent(s): 62095ad

Improve model card: add metadata, project page, paper abstract, and update paper link (#1)

Browse files

- Improve model card: add metadata, project page, paper abstract, and update paper link (ce7daed84befa9661c409e0c9c94d2288a3ca4a8)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +26 -5
README.md CHANGED
@@ -1,24 +1,45 @@
1
  ---
2
  base_model:
3
  - stabilityai/stable-diffusion-3.5-medium
4
- library_name: peft
 
5
  ---
6
 
7
  # Model Card
8
 
9
-
10
  ## Model Details
11
 
12
  ### Model Description
13
-
14
- This is a reproduced LoRA of SD3.5-Medium, post-trained with DiffusionNFT on multiple reward models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ### Model Sources
17
 
18
  <!-- Provide the basic links for the model. -->
19
 
20
  - **Repository:** https://github.com/NVlabs/DiffusionNFT
21
- - **Paper:** http://arxiv.org/abs/2509.16117
 
22
 
23
  ## Uses
24
 
 
1
  ---
2
  base_model:
3
  - stabilityai/stable-diffusion-3.5-medium
4
+ library_name: diffusers
5
+ pipeline_tag: text-to-image
6
  ---
7
 
8
  # Model Card
9
 
 
10
  ## Model Details
11
 
12
  ### Model Description
13
+ This is a reproduced LoRA of SD3.5-Medium, post-trained with DiffusionNFT on multiple reward models, as presented in the paper [Diffusion Negative-aware FineTuning (DiffusionNFT)](https://huggingface.co/papers/2509.16117).
14
+
15
+ ### Paper Abstract
16
+ Online reinforcement learning (RL) has been central to post-training language
17
+ models, but its extension to diffusion models remains challenging due to
18
+ intractable likelihoods. Recent works discretize the reverse sampling process
19
+ to enable GRPO-style training, yet they inherit fundamental drawbacks,
20
+ including solver restrictions, forward-reverse inconsistency, and complicated
21
+ integration with classifier-free guidance (CFG). We introduce Diffusion
22
+ Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that
23
+ optimizes diffusion models directly on the forward process via flow matching.
24
+ DiffusionNFT contrasts positive and negative generations to define an implicit
25
+ policy improvement direction, naturally incorporating reinforcement signals
26
+ into the supervised learning objective. This formulation enables training with
27
+ arbitrary black-box solvers, eliminates the need for likelihood estimation, and
28
+ requires only clean images rather than sampling trajectories for policy
29
+ optimization. DiffusionNFT is up to 25times more efficient than FlowGRPO in
30
+ head-to-head comparisons, while being CFG-free. For instance, DiffusionNFT
31
+ improves the GenEval score from 0.24 to 0.98 within 1k steps, while FlowGRPO
32
+ achieves 0.95 with over 5k steps and additional CFG employment. By leveraging
33
+ multiple reward models, DiffusionNFT significantly boosts the performance of
34
+ SD3.5-Medium in every benchmark tested.
35
 
36
  ### Model Sources
37
 
38
  <!-- Provide the basic links for the model. -->
39
 
40
  - **Repository:** https://github.com/NVlabs/DiffusionNFT
41
+ - **Paper:** https://huggingface.co/papers/2509.16117
42
+ - **Project Page:** https://research.nvidia.com/labs/dir/DiffusionNFT
43
 
44
  ## Uses
45