| <!--Copyright 2024 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # DreamBooth | |
| [DreamBooth](https://arxiv.org/abs/2208.12242)λ ν μ£Όμ μ λν μ μ μ΄λ―Έμ§(3~5κ°)λ§μΌλ‘λ stable diffusionκ³Ό κ°μ΄ text-to-image λͺ¨λΈμ κ°μΈνν μ μλ λ°©λ²μ λλ€. μ΄λ₯Ό ν΅ν΄ λͺ¨λΈμ λ€μν μ₯λ©΄, ν¬μ¦ λ° μ₯λ©΄(λ·°)μμ νΌμ¬μ²΄μ λν΄ λ§₯λ½ν(contextualized)λ μ΄λ―Έμ§λ₯Ό μμ±ν μ μμ΅λλ€. | |
|  | |
| <small>μμμ Dreambooth μμ <a href="https://dreambooth.github.io">project's blog.</a></small> | |
| μ΄ κ°μ΄λλ λ€μν GPU, Flax μ¬μμ λν΄ [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) λͺ¨λΈλ‘ DreamBoothλ₯Ό νμΈνλνλ λ°©λ²μ 보μ¬μ€λλ€. λ κΉμ΄ νκ³ λ€μ΄ μλ λ°©μμ νμΈνλ λ° κ΄μ¬μ΄ μλ κ²½μ°, μ΄ κ°μ΄λμ μ¬μ©λ DreamBoothμ λͺ¨λ νμ΅ μ€ν¬λ¦½νΈλ₯Ό [μ¬κΈ°](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)μμ μ°Ύμ μ μμ΅λλ€. | |
| μ€ν¬λ¦½νΈλ₯Ό μ€ννκΈ° μ μ λΌμ΄λΈλ¬λ¦¬μ νμ΅μ νμν dependenciesλ₯Ό μ€μΉν΄μΌ ν©λλ€. λν `main` GitHub λΈλμΉμμ 𧨠Diffusersλ₯Ό μ€μΉνλ κ²μ΄ μ’μ΅λλ€. | |
| ```bash | |
| pip install git+https://github.com/huggingface/diffusers | |
| pip install -U -r diffusers/examples/dreambooth/requirements.txt | |
| ``` | |
| xFormersλ νμ΅μ νμν μꡬ μ¬νμ μλμ§λ§, κ°λ₯νλ©΄ [μ€μΉ](../optimization/xformers)νλ κ²μ΄ μ’μ΅λλ€. νμ΅ μλλ₯Ό λμ΄κ³ λ©λͺ¨λ¦¬ μ¬μ©λμ μ€μΌ μ μκΈ° λλ¬Έμ λλ€. | |
| λͺ¨λ dependenciesμ μ€μ ν ν λ€μμ μ¬μ©νμ¬ [π€ Accelerate](https://github.com/huggingface/accelerate/) νκ²½μ λ€μκ³Ό κ°μ΄ μ΄κΈ°νν©λλ€: | |
| ```bash | |
| accelerate config | |
| ``` | |
| λ³λ μ€μ μμ΄ κΈ°λ³Έ π€ Accelerate νκ²½μ μ€μΉνλ €λ©΄ λ€μμ μ€νν©λλ€: | |
| ```bash | |
| accelerate config default | |
| ``` | |
| λλ νμ¬ νκ²½μ΄ λ ΈνΈλΆκ³Ό κ°μ λνν μ Έμ μ§μνμ§ μλ κ²½μ° λ€μμ μ¬μ©ν μ μμ΅λλ€: | |
| ```py | |
| from accelerate.utils import write_basic_config | |
| write_basic_config() | |
| ``` | |
| ## νμΈνλ | |
| <Tip warning={true}> | |
| DreamBooth νμΈνλμ νμ΄νΌνλΌλ―Έν°μ λ§€μ° λ―Όκ°νκ³ κ³Όμ ν©λκΈ° μ½μ΅λλ€. μ μ ν νμ΄νΌνλΌλ―Έν°λ₯Ό μ ννλ λ° λμμ΄ λλλ‘ λ€μν κΆμ₯ μ€μ μ΄ ν¬ν¨λ [μ¬μΈ΅ λΆμ](https://huggingface.co/blog/dreambooth)μ μ΄ν΄λ³΄λ κ²μ΄ μ’μ΅λλ€. | |
| </Tip> | |
| <frameworkcontent> | |
| <pt> | |
| [λͺ μ₯μ κ°μμ§ μ΄λ―Έμ§λ€](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ)λ‘ DreamBoothλ₯Ό μλν΄λ΄ μλ€. | |
| μ΄λ₯Ό λ€μ΄λ‘λν΄ λλ ν°λ¦¬μ μ μ₯ν λ€μ `INSTANCE_DIR` νκ²½ λ³μλ₯Ό ν΄λΉ κ²½λ‘λ‘ μ€μ ν©λλ€: | |
| ```bash | |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
| export INSTANCE_DIR="path_to_training_images" | |
| export OUTPUT_DIR="path_to_saved_model" | |
| ``` | |
| κ·Έλ° λ€μ, λ€μ λͺ λ Ήμ μ¬μ©νμ¬ νμ΅ μ€ν¬λ¦½νΈλ₯Ό μ€νν μ μμ΅λλ€ (μ 체 νμ΅ μ€ν¬λ¦½νΈλ [μ¬κΈ°](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)μμ μ°Ύμ μ μμ΅λλ€): | |
| ```bash | |
| accelerate launch train_dreambooth.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=1 \ | |
| --learning_rate=5e-6 \ | |
| --lr_scheduler="constant" \ | |
| --lr_warmup_steps=0 \ | |
| --max_train_steps=400 | |
| ``` | |
| </pt> | |
| <jax> | |
| TPUμ μ‘μΈμ€ν μ μκ±°λ λ λΉ λ₯΄κ² νλ ¨νκ³ μΆλ€λ©΄ [Flax νμ΅ μ€ν¬λ¦½νΈ](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py)λ₯Ό μ¬μ©ν΄ λ³Ό μ μμ΅λλ€. Flax νμ΅ μ€ν¬λ¦½νΈλ gradient checkpointing λλ gradient accumulationμ μ§μνμ§ μμΌλ―λ‘, λ©λͺ¨λ¦¬κ° 30GB μ΄μμΈ GPUκ° νμν©λλ€. | |
| μ€ν¬λ¦½νΈλ₯Ό μ€ννκΈ° μ μ μꡬ μ¬νμ΄ μ€μΉλμ΄ μλμ§ νμΈνμμμ€. | |
| ```bash | |
| pip install -U -r requirements.txt | |
| ``` | |
| κ·Έλ¬λ©΄ λ€μ λͺ λ Ήμ΄λ‘ νμ΅ μ€ν¬λ¦½νΈλ₯Ό μ€νμν¬ μ μμ΅λλ€: | |
| ```bash | |
| export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" | |
| export INSTANCE_DIR="path-to-instance-images" | |
| export OUTPUT_DIR="path-to-save-model" | |
| python train_dreambooth_flax.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --learning_rate=5e-6 \ | |
| --max_train_steps=400 | |
| ``` | |
| </jax> | |
| </frameworkcontent> | |
| ### Prior-preserving(μ¬μ 보쑴) lossλ₯Ό μ¬μ©ν νμΈνλ | |
| κ³Όμ ν©κ³Ό language driftλ₯Ό λ°©μ§νκΈ° μν΄ μ¬μ λ³΄μ‘΄μ΄ μ¬μ©λ©λλ€(κ΄μ¬μ΄ μλ κ²½μ° [λ Όλ¬Έ](https://arxiv.org/abs/2208.12242)μ μ°Έμ‘°νμΈμ). μ¬μ 보쑴μ μν΄ λμΌν ν΄λμ€μ λ€λ₯Έ μ΄λ―Έμ§λ₯Ό νμ΅ νλ‘μΈμ€μ μΌλΆλ‘ μ¬μ©ν©λλ€. μ’μ μ μ Stable Diffusion λͺ¨λΈ μ체λ₯Ό μ¬μ©νμ¬ μ΄λ¬ν μ΄λ―Έμ§λ₯Ό μμ±ν μ μλ€λ κ²μ λλ€! νμ΅ μ€ν¬λ¦½νΈλ μμ±λ μ΄λ―Έμ§λ₯Ό μ°λ¦¬κ° μ§μ ν λ‘컬 κ²½λ‘μ μ μ₯ν©λλ€. | |
| μ μλ€μ λ°λ₯΄λ©΄ μ¬μ 보쑴μ μν΄ `num_epochs * num_samples`κ°μ μ΄λ―Έμ§λ₯Ό μμ±νλ κ²μ΄ μ’μ΅λλ€. 200-300κ°μμ λλΆλΆ μ μλν©λλ€. | |
| <frameworkcontent> | |
| <pt> | |
| ```bash | |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
| export INSTANCE_DIR="path_to_training_images" | |
| export CLASS_DIR="path_to_class_images" | |
| export OUTPUT_DIR="path_to_saved_model" | |
| accelerate launch train_dreambooth.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=1 \ | |
| --learning_rate=5e-6 \ | |
| --lr_scheduler="constant" \ | |
| --lr_warmup_steps=0 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 | |
| ``` | |
| </pt> | |
| <jax> | |
| ```bash | |
| export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" | |
| export INSTANCE_DIR="path-to-instance-images" | |
| export CLASS_DIR="path-to-class-images" | |
| export OUTPUT_DIR="path-to-save-model" | |
| python train_dreambooth_flax.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --learning_rate=5e-6 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 | |
| ``` | |
| </jax> | |
| </frameworkcontent> | |
| ## ν μ€νΈ μΈμ½λμ and UNetλ‘ νμΈνλνκΈ° | |
| ν΄λΉ μ€ν¬λ¦½νΈλ₯Ό μ¬μ©νλ©΄ `unet`κ³Ό ν¨κ» `text_encoder`λ₯Ό νμΈνλν μ μμ΅λλ€. μ€νμμ(μμΈν λ΄μ©μ [𧨠Diffusersλ₯Ό μ¬μ©ν΄ DreamBoothλ‘ Stable Diffusion νμ΅νκΈ°](https://huggingface.co/blog/dreambooth) κ²μλ¬Όμ νμΈνμΈμ), νΉν μΌκ΅΄ μ΄λ―Έμ§λ₯Ό μμ±ν λ ν¨μ¬ λ λμ κ²°κ³Όλ₯Ό μ»μ μ μμ΅λλ€. | |
| <Tip warning={true}> | |
| ν μ€νΈ μΈμ½λλ₯Ό νμ΅μν€λ €λ©΄ μΆκ° λ©λͺ¨λ¦¬κ° νμν΄ 16GB GPUλ‘λ λμνμ§ μμ΅λλ€. μ΄ μ΅μ μ μ¬μ©νλ €λ©΄ μ΅μ 24GB VRAMμ΄ νμν©λλ€. | |
| </Tip> | |
| `--train_text_encoder` μΈμλ₯Ό νμ΅ μ€ν¬λ¦½νΈμ μ λ¬νμ¬ `text_encoder` λ° `unet`μ νμΈνλν μ μμ΅λλ€: | |
| <frameworkcontent> | |
| <pt> | |
| ```bash | |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
| export INSTANCE_DIR="path_to_training_images" | |
| export CLASS_DIR="path_to_class_images" | |
| export OUTPUT_DIR="path_to_saved_model" | |
| accelerate launch train_dreambooth.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --train_text_encoder \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --use_8bit_adam | |
| --gradient_checkpointing \ | |
| --learning_rate=2e-6 \ | |
| --lr_scheduler="constant" \ | |
| --lr_warmup_steps=0 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 | |
| ``` | |
| </pt> | |
| <jax> | |
| ```bash | |
| export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" | |
| export INSTANCE_DIR="path-to-instance-images" | |
| export CLASS_DIR="path-to-class-images" | |
| export OUTPUT_DIR="path-to-save-model" | |
| python train_dreambooth_flax.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --train_text_encoder \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --learning_rate=2e-6 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 | |
| ``` | |
| </jax> | |
| </frameworkcontent> | |
| ## LoRAλ‘ νμΈνλνκΈ° | |
| DreamBoothμμ λκ·λͺ¨ λͺ¨λΈμ νμ΅μ κ°μννκΈ° μν νμΈνλ κΈ°μ μΈ LoRA(Low-Rank Adaptation of Large Language Models)λ₯Ό μ¬μ©ν μ μμ΅λλ€. μμΈν λ΄μ©μ [LoRA νμ΅](training/lora#dreambooth) κ°μ΄λλ₯Ό μ°Έμ‘°νμΈμ. | |
| ### νμ΅ μ€ μ²΄ν¬ν¬μΈνΈ μ μ₯νκΈ° | |
| Dreamboothλ‘ νλ ¨νλ λμ κ³Όμ ν©νκΈ° μ¬μ°λ―λ‘, λλλ‘ νμ΅ μ€μ μ κΈ°μ μΈ μ²΄ν¬ν¬μΈνΈλ₯Ό μ μ₯νλ κ²μ΄ μ μ©ν©λλ€. μ€κ° 체ν¬ν¬μΈνΈ μ€ νλκ° μ΅μ’ λͺ¨λΈλ³΄λ€ λ μ μλν μ μμ΅λλ€! 체ν¬ν¬μΈνΈ μ μ₯ κΈ°λ₯μ νμ±ννλ €λ©΄ νμ΅ μ€ν¬λ¦½νΈμ λ€μ μΈμλ₯Ό μ λ¬ν΄μΌ ν©λλ€: | |
| ```bash | |
| --checkpointing_steps=500 | |
| ``` | |
| μ΄λ κ² νλ©΄ `output_dir`μ νμ ν΄λμ μ 체 νμ΅ μνκ° μ μ₯λ©λλ€. νμ ν΄λ μ΄λ¦μ μ λμ¬ `checkpoint-`λ‘ μμνκ³ μ§κΈκΉμ§ μνλ step μμ λλ€. μμλ‘ `checkpoint-1500`μ 1500 νμ΅ step νμ μ μ₯λ 체ν¬ν¬μΈνΈμ λλ€. | |
| #### μ μ₯λ 체ν¬ν¬μΈνΈμμ νλ ¨ μ¬κ°νκΈ° | |
| μ μ₯λ 체ν¬ν¬μΈνΈμμ νλ ¨μ μ¬κ°νλ €λ©΄, `--resume_from_checkpoint` μΈμλ₯Ό μ λ¬ν λ€μ μ¬μ©ν 체ν¬ν¬μΈνΈμ μ΄λ¦μ μ§μ νλ©΄ λ©λλ€. νΉμ λ¬Έμμ΄ `"latest"`λ₯Ό μ¬μ©νμ¬ μ μ₯λ λ§μ§λ§ 체ν¬ν¬μΈνΈ(μ¦, step μκ° κ°μ₯ λ§μ 체ν¬ν¬μΈνΈ)μμ μ¬κ°ν μλ μμ΅λλ€. μλ₯Ό λ€μ΄ λ€μμ 1500 step νμ μ μ₯λ 체ν¬ν¬μΈνΈμμλΆν° νμ΅μ μ¬κ°ν©λλ€: | |
| ```bash | |
| --resume_from_checkpoint="checkpoint-1500" | |
| ``` | |
| μνλ κ²½μ° μΌλΆ νμ΄νΌνλΌλ―Έν°λ₯Ό μ‘°μ ν μ μμ΅λλ€. | |
| #### μ μ₯λ 체ν¬ν¬μΈνΈλ₯Ό μ¬μ©νμ¬ μΆλ‘ μννκΈ° | |
| μ μ₯λ 체ν¬ν¬μΈνΈλ νλ ¨ μ¬κ°μ μ ν©ν νμμΌλ‘ μ μ₯λ©λλ€. μ¬κΈ°μλ λͺ¨λΈ κ°μ€μΉλΏλ§ μλλΌ μ΅ν°λ§μ΄μ , λ°μ΄ν° λ‘λ λ° νμ΅λ₯ μ μνλ ν¬ν¨λ©λλ€. | |
| **`"accelerate>=0.16.0"`**μ΄ μ€μΉλ κ²½μ° λ€μ μ½λλ₯Ό μ¬μ©νμ¬ μ€κ° 체ν¬ν¬μΈνΈμμ μΆλ‘ μ μ€νν©λλ€. | |
| ```python | |
| from diffusers import DiffusionPipeline, UNet2DConditionModel | |
| from transformers import CLIPTextModel | |
| import torch | |
| # νμ΅μ μ¬μ©λ κ²κ³Ό λμΌν μΈμ(model, revision)λ‘ νμ΄νλΌμΈμ λΆλ¬μ΅λλ€. | |
| model_id = "CompVis/stable-diffusion-v1-4" | |
| unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet") | |
| # `args.train_text_encoder`λ‘ νμ΅ν κ²½μ°λ©΄ ν μ€νΈ μΈμ½λλ₯Ό κΌ λΆλ¬μ€μΈμ | |
| text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder") | |
| pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16) | |
| pipeline.to("cuda") | |
| # μΆλ‘ μ μννκ±°λ μ μ₯νκ±°λ, νλΈμ νΈμν©λλ€. | |
| pipeline.save_pretrained("dreambooth-pipeline") | |
| ``` | |
| If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first: | |
| ```python | |
| from accelerate import Accelerator | |
| from diffusers import DiffusionPipeline | |
| # νμ΅μ μ¬μ©λ κ²κ³Ό λμΌν μΈμ(model, revision)λ‘ νμ΄νλΌμΈμ λΆλ¬μ΅λλ€. | |
| model_id = "CompVis/stable-diffusion-v1-4" | |
| pipeline = DiffusionPipeline.from_pretrained(model_id) | |
| accelerator = Accelerator() | |
| # μ΄κΈ° νμ΅μ `--train_text_encoder`κ° μ¬μ©λ κ²½μ° text_encoderλ₯Ό μ¬μ©ν©λλ€. | |
| unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder) | |
| # 체ν¬ν¬μΈνΈ κ²½λ‘λ‘λΆν° μνλ₯Ό 볡μν©λλ€. μ¬κΈ°μλ μ λ κ²½λ‘λ₯Ό μ¬μ©ν΄μΌ ν©λλ€. | |
| accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100") | |
| # unwrapped λͺ¨λΈλ‘ νμ΄νλΌμΈμ λ€μ λΉλν©λλ€.(.unet and .text_encoderλ‘μ ν λΉλ μλν΄μΌ ν©λλ€) | |
| pipeline = DiffusionPipeline.from_pretrained( | |
| model_id, | |
| unet=accelerator.unwrap_model(unet), | |
| text_encoder=accelerator.unwrap_model(text_encoder), | |
| ) | |
| # μΆλ‘ μ μννκ±°λ μ μ₯νκ±°λ, νλΈμ νΈμν©λλ€. | |
| pipeline.save_pretrained("dreambooth-pipeline") | |
| ``` | |
| ## κ° GPU μ©λμμμ μ΅μ ν | |
| νλμ¨μ΄μ λ°λΌ 16GBμμ 8GBκΉμ§ GPUμμ DreamBoothλ₯Ό μ΅μ ννλ λͺ κ°μ§ λ°©λ²μ΄ μμ΅λλ€! | |
| ### xFormers | |
| [xFormers](https://github.com/facebookresearch/xformers)λ Transformersλ₯Ό μ΅μ ννκΈ° μν toolboxμ΄λ©°, 𧨠Diffusersμμ μ¬μ©λλ[memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) λ©μ»€λμ¦μ ν¬ν¨νκ³ μμ΅λλ€. [xFormersλ₯Ό μ€μΉ](./optimization/xformers)ν λ€μ νμ΅ μ€ν¬λ¦½νΈμ λ€μ μΈμλ₯Ό μΆκ°ν©λλ€: | |
| ```bash | |
| --enable_xformers_memory_efficient_attention | |
| ``` | |
| xFormersλ Flaxμμ μ¬μ©ν μ μμ΅λλ€. | |
| ### κ·ΈλλμΈνΈ μμμΌλ‘ μ€μ | |
| λ©λͺ¨λ¦¬ μ¬μ©λμ μ€μΌ μ μλ λ λ€λ₯Έ λ°©λ²μ [κΈ°μΈκΈ° μ€μ ](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html)μ 0 λμ `None`μΌλ‘ νλ κ²μ λλ€. κ·Έλ¬λ μ΄λ‘ μΈν΄ νΉμ λμμ΄ λ³κ²½λ μ μμΌλ―λ‘ λ¬Έμ κ° λ°μνλ©΄ μ΄ μΈμλ₯Ό μ κ±°ν΄ λ³΄μμμ€. νμ΅ μ€ν¬λ¦½νΈμ λ€μ μΈμλ₯Ό μΆκ°νμ¬ κ·ΈλλμΈνΈλ₯Ό `None`μΌλ‘ μ€μ ν©λλ€. | |
| ```bash | |
| --set_grads_to_none | |
| ``` | |
| ### 16GB GPU | |
| Gradient checkpointingκ³Ό [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)μ 8λΉνΈ μ΅ν°λ§μ΄μ μ λμμΌλ‘, 16GB GPUμμ dreamboothλ₯Ό νλ ¨ν μ μμ΅λλ€. bitsandbytesκ° μ€μΉλμ΄ μλμ§ νμΈνμΈμ: | |
| ```bash | |
| pip install bitsandbytes | |
| ``` | |
| κ·Έ λ€μ, νμ΅ μ€ν¬λ¦½νΈμ `--use_8bit_adam` μ΅μ μ λͺ μν©λλ€: | |
| ```bash | |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
| export INSTANCE_DIR="path_to_training_images" | |
| export CLASS_DIR="path_to_class_images" | |
| export OUTPUT_DIR="path_to_saved_model" | |
| accelerate launch train_dreambooth.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=2 --gradient_checkpointing \ | |
| --use_8bit_adam \ | |
| --learning_rate=5e-6 \ | |
| --lr_scheduler="constant" \ | |
| --lr_warmup_steps=0 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 | |
| ``` | |
| ### 12GB GPU | |
| 12GB GPUμμ DreamBoothλ₯Ό μ€ννλ €λ©΄ gradient checkpointing, 8λΉνΈ μ΅ν°λ§μ΄μ , xFormersλ₯Ό νμ±ννκ³ κ·ΈλλμΈνΈλ₯Ό `None`μΌλ‘ μ€μ ν΄μΌ ν©λλ€. | |
| ```bash | |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
| export INSTANCE_DIR="path-to-instance-images" | |
| export CLASS_DIR="path-to-class-images" | |
| export OUTPUT_DIR="path-to-save-model" | |
| accelerate launch train_dreambooth.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=1 --gradient_checkpointing \ | |
| --use_8bit_adam \ | |
| --enable_xformers_memory_efficient_attention \ | |
| --set_grads_to_none \ | |
| --learning_rate=2e-6 \ | |
| --lr_scheduler="constant" \ | |
| --lr_warmup_steps=0 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 | |
| ``` | |
| ### 8GB GPUμμ νμ΅νκΈ° | |
| 8GB GPUμ λν΄μλ [DeepSpeed](https://www.deepspeed.ai/)λ₯Ό μ¬μ©ν΄ μΌλΆ ν μλ₯Ό VRAMμμ CPU λλ NVMEλ‘ μ€νλ‘λνμ¬ λ μ μ GPU λ©λͺ¨λ¦¬λ‘ νμ΅ν μλ μμ΅λλ€. | |
| π€ Accelerate νκ²½μ ꡬμ±νλ €λ©΄ λ€μ λͺ λ Ήμ μ€ννμΈμ: | |
| ```bash | |
| accelerate config | |
| ``` | |
| νκ²½ κ΅¬μ± μ€μ DeepSpeedλ₯Ό μ¬μ©ν κ²μ νμΈνμΈμ. | |
| κ·Έλ¬λ©΄ DeepSpeed stage 2, fp16 νΌν© μ λ°λλ₯Ό κ²°ν©νκ³ λͺ¨λΈ λ§€κ°λ³μμ μ΅ν°λ§μ΄μ μνλ₯Ό λͺ¨λ CPUλ‘ μ€νλ‘λνλ©΄ 8GB VRAM λ―Έλ§μμ νμ΅ν μ μμ΅λλ€. | |
| λ¨μ μ λ λ§μ μμ€ν RAM(μ½ 25GB)μ΄ νμνλ€λ κ²μ λλ€. μΆκ° κ΅¬μ± μ΅μ μ [DeepSpeed λ¬Έμ](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)λ₯Ό μ°Έμ‘°νμΈμ. | |
| λν κΈ°λ³Έ Adam μ΅ν°λ§μ΄μ λ₯Ό DeepSpeedμ μ΅μ νλ Adam λ²μ μΌλ‘ λ³κ²½ν΄μΌ ν©λλ€. | |
| μ΄λ μλΉν μλ ν₯μμ μν AdamμΈ [`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu)μ λλ€. | |
| `DeepSpeedCPUAdam`μ νμ±ννλ €λ©΄ μμ€ν μ CUDA toolchain λ²μ μ΄ PyTorchμ ν¨κ» μ€μΉλ κ²κ³Ό λμΌν΄μΌ ν©λλ€. | |
| 8λΉνΈ μ΅ν°λ§μ΄μ λ νμ¬ DeepSpeedμ νΈνλμ§ μλ κ² κ°μ΅λλ€. | |
| λ€μ λͺ λ ΉμΌλ‘ νμ΅μ μμν©λλ€: | |
| ```bash | |
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
| export INSTANCE_DIR="path_to_training_images" | |
| export CLASS_DIR="path_to_class_images" | |
| export OUTPUT_DIR="path_to_saved_model" | |
| accelerate launch train_dreambooth.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --instance_data_dir=$INSTANCE_DIR \ | |
| --class_data_dir=$CLASS_DIR \ | |
| --output_dir=$OUTPUT_DIR \ | |
| --with_prior_preservation --prior_loss_weight=1.0 \ | |
| --instance_prompt="a photo of sks dog" \ | |
| --class_prompt="a photo of dog" \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --sample_batch_size=1 \ | |
| --gradient_accumulation_steps=1 --gradient_checkpointing \ | |
| --learning_rate=5e-6 \ | |
| --lr_scheduler="constant" \ | |
| --lr_warmup_steps=0 \ | |
| --num_class_images=200 \ | |
| --max_train_steps=800 \ | |
| --mixed_precision=fp16 | |
| ``` | |
| ## μΆλ‘ | |
| λͺ¨λΈμ νμ΅ν νμλ, λͺ¨λΈμ΄ μ μ₯λ κ²½λ‘λ₯Ό μ§μ ν΄ [`StableDiffusionPipeline`]λ‘ μΆλ‘ μ μνν μ μμ΅λλ€. ν둬ννΈμ νμ΅μ μ¬μ©λ νΉμ `μλ³μ`(μ΄μ μμμ `sks`)κ° ν¬ν¨λμ΄ μλμ§ νμΈνμΈμ. | |
| **`"accelerate>=0.16.0"`**μ΄ μ€μΉλμ΄ μλ κ²½μ° λ€μ μ½λλ₯Ό μ¬μ©νμ¬ μ€κ° 체ν¬ν¬μΈνΈμμ μΆλ‘ μ μ€νν μ μμ΅λλ€: | |
| ```python | |
| from diffusers import StableDiffusionPipeline | |
| import torch | |
| model_id = "path_to_saved_model" | |
| pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") | |
| prompt = "A photo of sks dog in a bucket" | |
| image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] | |
| image.save("dog-bucket.png") | |
| ``` | |
| [μ μ₯λ νμ΅ μ²΄ν¬ν¬μΈνΈ](#inference-from-a-saved-checkpoint)μμλ μΆλ‘ μ μ€νν μλ μμ΅λλ€. | |