Bidhan Roy
		
	commited on
		
		
					Commit 
							
							·
						
						0b94e29
	
1
								Parent(s):
							
							73872ef
								
Add README updates and images with Git LFS
Browse files- .gitattributes +1 -0
 - README.md +157 -67
 - bagel_labs_logo.png +3 -0
 - generated_images.png +3 -0
 - training_architecture.png +3 -0
 
    	
        .gitattributes
    CHANGED
    
    | 
         @@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text 
     | 
|
| 33 | 
         
             
            *.zip filter=lfs diff=lfs merge=lfs -text
         
     | 
| 34 | 
         
             
            *.zst filter=lfs diff=lfs merge=lfs -text
         
     | 
| 35 | 
         
             
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         
     | 
| 
         | 
| 
         | 
|
| 33 | 
         
             
            *.zip filter=lfs diff=lfs merge=lfs -text
         
     | 
| 34 | 
         
             
            *.zst filter=lfs diff=lfs merge=lfs -text
         
     | 
| 35 | 
         
             
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         
     | 
| 36 | 
         
            +
            *.png filter=lfs diff=lfs merge=lfs -text
         
     | 
    	
        README.md
    CHANGED
    
    | 
         @@ -6,108 +6,198 @@ tags: 
     | 
|
| 6 | 
         
             
            - multi-expert
         
     | 
| 7 | 
         
             
            - dit
         
     | 
| 8 | 
         
             
            - laion
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 9 | 
         
             
            ---
         
     | 
| 10 | 
         | 
| 11 | 
         
            -
             
     | 
| 12 | 
         | 
| 13 | 
         
            -
             
     | 
| 14 | 
         | 
| 15 | 
         
            -
             
     | 
| 16 | 
         | 
| 17 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 18 | 
         | 
| 19 | 
         
            -
             
     | 
| 20 | 
         
            -
            - **Router**: dit-based routing network
         
     | 
| 21 | 
         
            -
            - **Hidden Size**: 1152
         
     | 
| 22 | 
         
            -
            - **Layers**: 28
         
     | 
| 23 | 
         
            -
            - **Attention Heads**: 16
         
     | 
| 24 | 
         
            -
            - **Parameters per Expert**: ~0M
         
     | 
| 25 | 
         
            -
            - **Total Parameters**: ~3M
         
     | 
| 26 | 
         
            -
            - **Text Conditioning**: ✓ (CLIP ViT-L/14)
         
     | 
| 27 | 
         
            -
            - **Training Dataset**: LAION-Aesthetic
         
     | 
| 28 | 
         | 
| 29 | 
         
            -
             
     | 
| 30 | 
         | 
| 31 | 
         
            -
             
     | 
| 32 | 
         
            -
             
     | 
| 33 | 
         
            -
             
     | 
| 34 | 
         
            -
             
     | 
| 35 | 
         
            -
             
     | 
| 36 | 
         
            -
             
     | 
| 37 | 
         
            -
             
     | 
| 38 | 
         
            -
             
     | 
| 39 | 
         
            -
             
     | 
| 40 | 
         
            -
             
     | 
| 41 | 
         
            -
             
     | 
| 42 | 
         
            -
             
     | 
| 43 | 
         
            -
             
     | 
| 44 | 
         
            -
             
     | 
| 45 | 
         
            -
             
     | 
| 46 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 47 | 
         | 
| 48 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 49 | 
         | 
| 50 | 
         
             
            ```python
         
     | 
| 51 | 
         
            -
            from  
     | 
| 
         | 
|
| 52 | 
         | 
| 53 | 
         
             
            # Load the pipeline
         
     | 
| 54 | 
         
            -
            pipeline =  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 55 | 
         | 
| 56 | 
         
             
            # Generate images
         
     | 
| 57 | 
         
             
            images = pipeline(
         
     | 
| 58 | 
         
             
                prompt="A beautiful sunset over Paris, oil painting style",
         
     | 
| 59 | 
         
             
                num_inference_steps=50,
         
     | 
| 60 | 
         
             
                guidance_scale=7.5,
         
     | 
| 61 | 
         
            -
                 
     | 
| 62 | 
         
            -
             
     | 
| 
         | 
|
| 63 | 
         | 
| 64 | 
         
            -
             
     | 
| 65 | 
         
            -
            for i, img in enumerate(images):
         
     | 
| 66 | 
         
            -
                img.save(f"output_{i}.png")
         
     | 
| 67 | 
         
             
            ```
         
     | 
| 68 | 
         | 
| 69 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 70 | 
         | 
| 71 | 
         
            -
             
     | 
| 72 | 
         
            -
            - **Batch Size**: 16 per expert
         
     | 
| 73 | 
         
            -
            - **Learning Rate**: 2e-05
         
     | 
| 74 | 
         
            -
            - **Image Size**: 256x256 (32x32 latent space)
         
     | 
| 75 | 
         
            -
            - **VAE**: SD VAE (8x downsampling)
         
     | 
| 76 | 
         
            -
            - **Text Encoder**: CLIP ViT-L/14
         
     | 
| 77 | 
         
            -
            - **EMA**: True
         
     | 
| 78 | 
         
            -
            - **Mixed Precision**: True
         
     | 
| 79 | 
         | 
| 80 | 
         
            -
             
     | 
| 81 | 
         | 
| 82 | 
         
            -
             
     | 
| 83 | 
         
            -
            - The router network analyzes the noisy latent and timestep
         
     | 
| 84 | 
         
            -
            - Selects the most appropriate expert for denoising
         
     | 
| 85 | 
         
            -
            - Enables better quality and diversity compared to single models
         
     | 
| 86 | 
         | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 87 | 
         | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 88 | 
         | 
| 89 | 
         
            -
             
     | 
| 90 | 
         | 
| 91 | 
         
            -
             
     | 
| 92 | 
         | 
| 93 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 94 | 
         | 
| 95 | 
         
            -
             
     | 
| 96 | 
         
            -
            - Best results at 256x256 resolution
         
     | 
| 97 | 
         
            -
            - Requires GPU for inference (8GB+ VRAM recommended)
         
     | 
| 98 | 
         | 
| 99 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 100 | 
         | 
| 101 | 
         
             
            ```bibtex
         
     | 
| 102 | 
         
            -
            @misc{ 
     | 
| 103 | 
         
            -
               
     | 
| 104 | 
         
            -
               
     | 
| 105 | 
         
            -
              year 
     | 
| 106 | 
         
            -
              publisher 
     | 
| 107 | 
         
            -
              url 
     | 
| 108 | 
         
             
            }
         
     | 
| 109 | 
         
             
            ```
         
     | 
| 110 | 
         | 
| 111 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 112 | 
         | 
| 113 | 
         
            -
             
     | 
| 
         | 
|
| 6 | 
         
             
            - multi-expert
         
     | 
| 7 | 
         
             
            - dit
         
     | 
| 8 | 
         
             
            - laion
         
     | 
| 9 | 
         
            +
            - distributed
         
     | 
| 10 | 
         
            +
            - decentralized
         
     | 
| 11 | 
         
            +
            - flow-matching
         
     | 
| 12 | 
         
             
            ---
         
     | 
| 13 | 
         | 
| 14 | 
         
            +
            <div align="center">
         
     | 
| 15 | 
         | 
| 16 | 
         
            +
            <img src="bagel_labs_logo.png" alt="Bagel Labs" width="120"/>
         
     | 
| 17 | 
         | 
| 18 | 
         
            +
            # Paris: A Decentralized Trained Open-Weight Diffusion Model
         
     | 
| 19 | 
         | 
| 20 | 
         
            +
            <a href="https://huggingface.co/bageldotcom/paris">
         
     | 
| 21 | 
         
            +
              <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Like%20this-model-yellow?style=for-the-badge" alt="Like on Hugging Face">
         
     | 
| 22 | 
         
            +
            </a>
         
     | 
| 23 | 
         
            +
            <a href="https://github.com/bageldotcom/paris">
         
     | 
| 24 | 
         
            +
              <img src="https://img.shields.io/github/stars/bageldotcom/paris?style=for-the-badge&logo=github&label=Star%20on%20GitHub" alt="Star on GitHub">
         
     | 
| 25 | 
         
            +
            </a>
         
     | 
| 26 | 
         
            +
            <a href="https://github.com/bageldotcom/Paris/blob/main/paper.pdf">
         
     | 
| 27 | 
         
            +
              <img src="https://img.shields.io/badge/📄%20Read-Technical%20Report-red?style=for-the-badge" alt="Read Technical Report">
         
     | 
| 28 | 
         
            +
            </a>
         
     | 
| 29 | 
         | 
| 30 | 
         
            +
            </div>
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 31 | 
         | 
| 32 | 
         
            +
            <br>
         
     | 
| 33 | 
         | 
| 34 | 
         
            +
            The world's first diffusion model trained entirely through decentralized computation. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization, achieving superior parallelism efficiency over traditional methods while using 14× less data and 16× less compute than baselines. [Read our technical report](https://github.com/bageldotcom/Paris/blob/main/paper.pdf) to learn more.
         
     | 
| 35 | 
         
            +
             
     | 
| 36 | 
         
            +
            # Key Characteristics
         
     | 
| 37 | 
         
            +
             
     | 
| 38 | 
         
            +
            - 8 independently trained expert diffusion models (605M parameters each, 4.84B total)
         
     | 
| 39 | 
         
            +
            - No gradient synchronization, parameter sharing, or activation exchange among nodes during training
         
     | 
| 40 | 
         
            +
            - Lightweight transformer router (~158M parameters) for dynamic expert selection
         
     | 
| 41 | 
         
            +
            - 11M LAION-Aesthetic images across 120 A40 GPU-days
         
     | 
| 42 | 
         
            +
            - 14× less training data than prior decentralized baselines
         
     | 
| 43 | 
         
            +
            - 16× less compute than prior decentralized baselines
         
     | 
| 44 | 
         
            +
            - Competitive generation quality (FID 12.45)
         
     | 
| 45 | 
         
            +
            - Open weights for research and commercial use under MIT license
         
     | 
| 46 | 
         
            +
             
     | 
| 47 | 
         
            +
            ---
         
     | 
| 48 | 
         
            +
             
     | 
| 49 | 
         
            +
            # Examples
         
     | 
| 50 | 
         
            +
             
     | 
| 51 | 
         
            +
            
         
     | 
| 52 | 
         
            +
             
     | 
| 53 | 
         
            +
            *Text-conditioned image generation samples using Paris across diverse prompts and visual styles*
         
     | 
| 54 | 
         
            +
             
     | 
| 55 | 
         
            +
            ---
         
     | 
| 56 | 
         
            +
             
     | 
| 57 | 
         
            +
            # Architecture Details
         
     | 
| 58 | 
         
            +
             
     | 
| 59 | 
         
            +
            | Component | Specification |
         
     | 
| 60 | 
         
            +
            |-----------|--------------|
         
     | 
| 61 | 
         
            +
            | **Model Scale** | DiT-XL/2 |
         
     | 
| 62 | 
         
            +
            | **Parameters per Expert** | 605M |
         
     | 
| 63 | 
         
            +
            | **Total Expert Parameters** | 4.84B (8 experts) |
         
     | 
| 64 | 
         
            +
            | **Router Parameters** | ~158M |
         
     | 
| 65 | 
         
            +
            | **Hidden Dimensions** | 1152 |
         
     | 
| 66 | 
         
            +
            | **Transformer Layers** | 28 |
         
     | 
| 67 | 
         
            +
            | **Attention Heads** | 16 |
         
     | 
| 68 | 
         
            +
            | **Patch Size** | 2×2 (latent space) |
         
     | 
| 69 | 
         
            +
            | **Latent Resolution** | 32×32×4 |
         
     | 
| 70 | 
         
            +
            | **Image Resolution** | 256×256 |
         
     | 
| 71 | 
         
            +
            | **Text Conditioning** | CLIP ViT-L/14 |
         
     | 
| 72 | 
         
            +
            | **VAE** | sd-vae-ft-mse (8× downsampling) |
         
     | 
| 73 | 
         
            +
             
     | 
| 74 | 
         
            +
            ---
         
     | 
| 75 | 
         
            +
             
     | 
| 76 | 
         
            +
            # Training Approach
         
     | 
| 77 | 
         
            +
             
     | 
| 78 | 
         
            +
            Paris implements fully decentralized training where:
         
     | 
| 79 | 
         
            +
             
     | 
| 80 | 
         
            +
            - Each expert trains independently on a semantically coherent data partition (DINOv2-based clustering)
         
     | 
| 81 | 
         
            +
            - No gradient synchronization, parameter sharing, or activation exchange between experts during training
         
     | 
| 82 | 
         
            +
            - Experts trained asynchronously across AWS, GCP, local clusters, and Runpod instances at different speeds
         
     | 
| 83 | 
         
            +
            - Router trained post-hoc on full dataset for expert selection during inference
         
     | 
| 84 | 
         
            +
            - Complete computational independence eliminates requirements for specialized interconnects (InfiniBand, NVLink)
         
     | 
| 85 | 
         
            +
             
     | 
| 86 | 
         
            +
            
         
     | 
| 87 | 
         
            +
             
     | 
| 88 | 
         
            +
            *Paris training phase showing complete asynchronous isolation across heterogeneous compute clusters. Unlike traditional parallelization strategies (Data/Pipeline/Model Parallelism), Paris requires zero communication during training.*
         
     | 
| 89 | 
         
            +
             
     | 
| 90 | 
         
            +
            This zero-communication approach enables training on fragmented compute resources without specialized interconnects, eliminating the dedicated GPU cluster requirement of traditional diffusion model training.
         
     | 
| 91 | 
         | 
| 92 | 
         
            +
            **Comparison with Traditional Parallelization**
         
     | 
| 93 | 
         
            +
             
     | 
| 94 | 
         
            +
            | **Strategy** | **Synchronization** | **Straggler Impact** | **Topology Requirements** |
         
     | 
| 95 | 
         
            +
            |--------------|---------------------|---------------------|---------------------------|
         
     | 
| 96 | 
         
            +
            | Data Parallel | Periodic all-reduce | Slowest worker blocks iteration | Latency-sensitive cluster |
         
     | 
| 97 | 
         
            +
            | Model Parallel | Sequential layer transfers | Slowest layer blocks pipeline | Linear pipeline |
         
     | 
| 98 | 
         
            +
            | Pipeline Parallel | Stage-to-stage per microbatch | Bubble overhead from slowest stage | Linear pipeline |
         
     | 
| 99 | 
         
            +
            | **Paris** | **No synchronization** | **No blocking** | **Arbitrary** |
         
     | 
| 100 | 
         
            +
             
     | 
| 101 | 
         
            +
            ---
         
     | 
| 102 | 
         
            +
             
     | 
| 103 | 
         
            +
            # Usage
         
     | 
| 104 | 
         | 
| 105 | 
         
             
            ```python
         
     | 
| 106 | 
         
            +
            from diffusers import DiffusionPipeline
         
     | 
| 107 | 
         
            +
            import torch
         
     | 
| 108 | 
         | 
| 109 | 
         
             
            # Load the pipeline
         
     | 
| 110 | 
         
            +
            pipeline = DiffusionPipeline.from_pretrained(
         
     | 
| 111 | 
         
            +
                "bageldotcom/paris",
         
     | 
| 112 | 
         
            +
                torch_dtype=torch.float16,
         
     | 
| 113 | 
         
            +
                use_safetensors=True
         
     | 
| 114 | 
         
            +
            )
         
     | 
| 115 | 
         
            +
            pipeline.to("cuda")
         
     | 
| 116 | 
         | 
| 117 | 
         
             
            # Generate images
         
     | 
| 118 | 
         
             
            images = pipeline(
         
     | 
| 119 | 
         
             
                prompt="A beautiful sunset over Paris, oil painting style",
         
     | 
| 120 | 
         
             
                num_inference_steps=50,
         
     | 
| 121 | 
         
             
                guidance_scale=7.5,
         
     | 
| 122 | 
         
            +
                height=256,
         
     | 
| 123 | 
         
            +
                width=256
         
     | 
| 124 | 
         
            +
            ).images
         
     | 
| 125 | 
         | 
| 126 | 
         
            +
            images[0].save("output.png")
         
     | 
| 
         | 
|
| 
         | 
|
| 127 | 
         
             
            ```
         
     | 
| 128 | 
         | 
| 129 | 
         
            +
            ### Routing Strategies
         
     | 
| 130 | 
         
            +
             
     | 
| 131 | 
         
            +
            - **`top-1`** (default): Single best expert per step. Fastest inference, competitive quality.
         
     | 
| 132 | 
         
            +
            - **`top-2`**: Weighted ensemble of top-2 experts. Often best quality, 2× inference cost.
         
     | 
| 133 | 
         
            +
            - **`full-ensemble`**: All 8 experts weighted by router. Highest compute (8× cost).
         
     | 
| 134 | 
         | 
| 135 | 
         
            +
            ---
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 136 | 
         | 
| 137 | 
         
            +
            # Performance Metrics
         
     | 
| 138 | 
         | 
| 139 | 
         
            +
            **Multi-Expert vs. Monolithic on LAION-Art (DiT-B/2)**
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 140 | 
         | 
| 141 | 
         
            +
            | **Inference Strategy** | **FID-50K ↓** |
         
     | 
| 142 | 
         
            +
            |------------------------|---------------|
         
     | 
| 143 | 
         
            +
            | Monolithic (single model) | 29.64 |
         
     | 
| 144 | 
         
            +
            | Paris Top-1 | 30.60 |
         
     | 
| 145 | 
         
            +
            | **Paris Top-2** | **22.60** |
         
     | 
| 146 | 
         
            +
            | Paris Full Ensemble | 47.89 |
         
     | 
| 147 | 
         | 
| 148 | 
         
            +
            *Top-2 routing achieves 7.04 FID improvement over monolithic baseline, validating that targeted expert collaboration outperforms both single models and naive ensemble averaging.*
         
     | 
| 149 | 
         
            +
             
     | 
| 150 | 
         
            +
            ---
         
     | 
| 151 | 
         | 
| 152 | 
         
            +
            # Training Details
         
     | 
| 153 | 
         | 
| 154 | 
         
            +
            **Hyperparameters (DiT-XL/2)**
         
     | 
| 155 | 
         | 
| 156 | 
         
            +
            | **Parameter** | **Value** |
         
     | 
| 157 | 
         
            +
            |---------------|-----------|
         
     | 
| 158 | 
         
            +
            | Dataset | LAION-Aesthetic (11M images) |
         
     | 
| 159 | 
         
            +
            | Clustering | DINOv2 semantic features |
         
     | 
| 160 | 
         
            +
            | Batch Size | 16 per expert (effective 32 with 2-step accumulation) |
         
     | 
| 161 | 
         
            +
            | Learning Rate | 2e-5 (AdamW, no scheduling) |
         
     | 
| 162 | 
         
            +
            | Training Steps | ~120k total across experts (asynchronous) |
         
     | 
| 163 | 
         
            +
            | EMA Decay | 0.9999 |
         
     | 
| 164 | 
         
            +
            | Mixed Precision | FP16 with automatic loss scaling |
         
     | 
| 165 | 
         
            +
            | Initialization | ImageNet-pretrained DiT-XL/2 |
         
     | 
| 166 | 
         
            +
            | Conditioning | AdaLN-Single (23% parameter reduction) |
         
     | 
| 167 | 
         | 
| 168 | 
         
            +
            **Router Training**
         
     | 
| 
         | 
|
| 
         | 
|
| 169 | 
         | 
| 170 | 
         
            +
            | **Parameter** | **Value** |
         
     | 
| 171 | 
         
            +
            |---------------|-----------|
         
     | 
| 172 | 
         
            +
            | Architecture | DiT-B (smaller than experts) |
         
     | 
| 173 | 
         
            +
            | Batch Size | 64 with 4-step accumulation (effective 256) |
         
     | 
| 174 | 
         
            +
            | Learning Rate | 5e-5 with cosine annealing (25 epochs) |
         
     | 
| 175 | 
         
            +
            | Loss | Cross-entropy on cluster assignments |
         
     | 
| 176 | 
         
            +
            | Training | Post-hoc on full dataset |
         
     | 
| 177 | 
         
            +
             
     | 
| 178 | 
         
            +
             
     | 
| 179 | 
         
            +
            ---
         
     | 
| 180 | 
         
            +
             
     | 
| 181 | 
         
            +
            # Citation
         
     | 
| 182 | 
         | 
| 183 | 
         
             
            ```bibtex
         
     | 
| 184 | 
         
            +
            @misc{paris2025,
         
     | 
| 185 | 
         
            +
              title={Paris: A Decentralized Trained Open-Weight Diffusion Model},
         
     | 
| 186 | 
         
            +
              author={Jiang, Zhiying and Seraj, Raihan and Villagra, Marcos and Roy, Bidhan},
         
     | 
| 187 | 
         
            +
              year={2025},
         
     | 
| 188 | 
         
            +
              publisher={Bagel Labs},
         
     | 
| 189 | 
         
            +
              url={https://huggingface.co/bageldotcom/paris}
         
     | 
| 190 | 
         
             
            }
         
     | 
| 191 | 
         
             
            ```
         
     | 
| 192 | 
         | 
| 193 | 
         
            +
            ---
         
     | 
| 194 | 
         
            +
             
     | 
| 195 | 
         
            +
            # License
         
     | 
| 196 | 
         
            +
             
     | 
| 197 | 
         
            +
            MIT License – Open for research and commercial use.
         
     | 
| 198 | 
         
            +
             
     | 
| 199 | 
         
            +
            <div align="center">
         
     | 
| 200 | 
         
            +
             
     | 
| 201 | 
         
            +
            Made with ❤️ by [Bagel Labs](https://bagel.com)
         
     | 
| 202 | 
         | 
| 203 | 
         
            +
            </div>
         
     | 
    	
        bagel_labs_logo.png
    ADDED
    
    
											 
									 | 
									
								
											Git LFS Details
  | 
									
    	
        generated_images.png
    ADDED
    
    
											 
									 | 
									
								
											Git LFS Details
  | 
									
    	
        training_architecture.png
    ADDED
    
    
											 
									 | 
									
								
											Git LFS Details
  |