Bidhan Roy
commited on
Commit
Β·
e3029ac
1
Parent(s):
85d1eb3
readme styling
Browse files
README.md
CHANGED
|
@@ -11,21 +11,23 @@ tags:
|
|
| 11 |
- flow-matching
|
| 12 |
---
|
| 13 |
|
| 14 |
-
<img src="bagel_labs_logo.png" alt="Bagel Labs"
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
<a href="https://huggingface.co/bageldotcom/paris">
|
| 19 |
-
<img src="https://img.shields.io/badge
|
| 20 |
</a>
|
| 21 |
-
<a href="https://github.com/bageldotcom/paris">
|
| 22 |
-
<img src="https://img.shields.io/
|
| 23 |
</a>
|
| 24 |
-
<a href="https://github.com/bageldotcom/
|
| 25 |
-
<img src="https://img.shields.io/badge
|
| 26 |
</a>
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
|
| 30 |
# Key Characteristics
|
| 31 |
|
|
@@ -42,7 +44,7 @@ The world's first diffusion model trained entirely through decentralized computa
|
|
| 42 |
|
| 43 |
# Examples
|
| 44 |
|
| 45 |
-

|
| 46 |
|
| 47 |
*Text-conditioned image generation samples using Paris across diverse prompts and visual styles*
|
| 48 |
|
|
@@ -77,7 +79,7 @@ Paris implements fully decentralized training where:
|
|
| 77 |
- Router trained post-hoc on full dataset for expert selection during inference
|
| 78 |
- Complete computational independence eliminates requirements for specialized interconnects (InfiniBand, NVLink)
|
| 79 |
|
| 80 |
-

|
| 81 |
|
| 82 |
*Paris training phase showing complete asynchronous isolation across heterogeneous compute clusters. Unlike traditional parallelization strategies (Data/Pipeline/Model Parallelism), Paris requires zero communication during training.*
|
| 83 |
|
|
@@ -101,6 +103,10 @@ This zero-communication approach enables training on fragmented compute resource
|
|
| 101 |
- **`top-2`**: Weighted ensemble of top-2 experts. Often best quality, 2Γ inference cost.
|
| 102 |
- **`full-ensemble`**: All 8 experts weighted by router. Highest compute (8Γ cost).
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
---
|
| 105 |
|
| 106 |
# Performance Metrics
|
|
@@ -164,8 +170,4 @@ This zero-communication approach enables training on fragmented compute resource
|
|
| 164 |
|
| 165 |
MIT License β Open for research and commercial use.
|
| 166 |
|
| 167 |
-
<
|
| 168 |
-
|
| 169 |
-
Made with β€οΈ by [Bagel Labs](https://bagel.com)
|
| 170 |
-
|
| 171 |
-
</div>
|
|
|
|
| 11 |
- flow-matching
|
| 12 |
---
|
| 13 |
|
| 14 |
+
<img src="images/bagel_labs_logo.png" alt="Bagel Labs" height="28" style="margin-bottom: 20px;"/>
|
| 15 |
|
| 16 |
+
<h1 style="font-size: 28px; margin-bottom: 20px;">Paris: A Decentralized Trained Open-Weight Diffusion Model</h1>
|
| 17 |
|
| 18 |
+
<a href="https://huggingface.co/bageldotcom/paris" target="_blank">
|
| 19 |
+
<img src="https://img.shields.io/badge/π€_DOWNLOAD_MODEL_WEIGHTS-FFD21E?style=for-the-badge&logoColor=000000" alt="Download Model Weights" height="40">
|
| 20 |
</a>
|
| 21 |
+
<a href="https://github.com/bageldotcom/paris" target="_blank">
|
| 22 |
+
<img src="https://img.shields.io/badge/β_STAR_ON_GITHUB-100000?style=for-the-badge&logo=github&logoColor=white" alt="Star on GitHub" height="40">
|
| 23 |
</a>
|
| 24 |
+
<a href="https://github.com/bageldotcom/paris/blob/main/paper.pdf" target="_blank">
|
| 25 |
+
<img src="https://img.shields.io/badge/π_READ_PAPER-FF6B6B?style=for-the-badge&logoColor=white" alt="Read Technical Report" height="40">
|
| 26 |
</a>
|
| 27 |
|
| 28 |
+
<div style="margin-top: 20px;"></div>
|
| 29 |
+
|
| 30 |
+
The world's first open-weight diffusion model trained entirely through decentralized computation. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization, achieving superior parallelism efficiency over traditional methods while using 14Γ less data and 16Γ less compute than baselines. [Read our technical report](https://github.com/bageldotcom/paris/blob/main/paper.pdf) to learn more.
|
| 31 |
|
| 32 |
# Key Characteristics
|
| 33 |
|
|
|
|
| 44 |
|
| 45 |
# Examples
|
| 46 |
|
| 47 |
+

|
| 48 |
|
| 49 |
*Text-conditioned image generation samples using Paris across diverse prompts and visual styles*
|
| 50 |
|
|
|
|
| 79 |
- Router trained post-hoc on full dataset for expert selection during inference
|
| 80 |
- Complete computational independence eliminates requirements for specialized interconnects (InfiniBand, NVLink)
|
| 81 |
|
| 82 |
+

|
| 83 |
|
| 84 |
*Paris training phase showing complete asynchronous isolation across heterogeneous compute clusters. Unlike traditional parallelization strategies (Data/Pipeline/Model Parallelism), Paris requires zero communication during training.*
|
| 85 |
|
|
|
|
| 103 |
- **`top-2`**: Weighted ensemble of top-2 experts. Often best quality, 2Γ inference cost.
|
| 104 |
- **`full-ensemble`**: All 8 experts weighted by router. Highest compute (8Γ cost).
|
| 105 |
|
| 106 |
+

|
| 107 |
+
|
| 108 |
+
*Multi-expert inference pipeline showing router-based expert selection and three different routing strategies: Top-1 (fastest), Top-2 (best quality), and Full Ensemble (highest compute).*
|
| 109 |
+
|
| 110 |
---
|
| 111 |
|
| 112 |
# Performance Metrics
|
|
|
|
| 170 |
|
| 171 |
MIT License β Open for research and commercial use.
|
| 172 |
|
| 173 |
+
Made with β€οΈ by <a href="https://twitter.com/bageldotcom" target="_blank"><img src="https://img.shields.io/badge/Bagel_Labs-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" alt="Follow Bagel Labs on Twitter" height="28"></a>
|
|
|
|
|
|
|
|
|
|
|
|
bagel_labs_logo.png β images/bagel_labs_logo.png
RENAMED
|
File without changes
|
generated_images.png β images/generated_images.png
RENAMED
|
File without changes
|
images/paris_inference.png
ADDED
|
Git LFS Details
|
training_architecture.png β images/training_architecture.png
RENAMED
|
File without changes
|