| <!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # Prompt tuning for causal language modeling | |
| [[open-in-colab]] | |
| Prompting helps guide language model behavior by adding some input text specific to a task. Prompt tuning is an additive method for only training and updating the newly added prompt tokens to a pretrained model. This way, you can use one pretrained model whose weights are frozen, and train and update a smaller set of prompt parameters for each downstream task instead of fully finetuning a separate model. As models grow larger and larger, prompt tuning can be more efficient, and results are even better as model parameters scale. | |
| <Tip> | |
| 💡 Read [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691) to learn more about prompt tuning. | |
| </Tip> | |
| This guide will show you how to apply prompt tuning to train a [`bloomz-560m`](https://huggingface.co/bigscience/bloomz-560m) model on the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset. | |
| Before you begin, make sure you have all the necessary libraries installed: | |
| ```bash | |
| !pip install -q peft transformers datasets | |
| ``` | |
| ## Setup | |
| Start by defining the model and tokenizer, the dataset and the dataset columns to train on, some training hyperparameters, and the [`PromptTuningConfig`]. The [`PromptTuningConfig`] contains information about the task type, the text to initialize the prompt embedding, the number of virtual tokens, and the tokenizer to use: | |
| ```py | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup | |
| from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType | |
| import torch | |
| from datasets import load_dataset | |
| import os | |
| from torch.utils.data import DataLoader | |
| from tqdm import tqdm | |
| device = "cuda" | |
| model_name_or_path = "bigscience/bloomz-560m" | |
| tokenizer_name_or_path = "bigscience/bloomz-560m" | |
| peft_config = PromptTuningConfig( | |
| task_type=TaskType.CAUSAL_LM, | |
| prompt_tuning_init=PromptTuningInit.TEXT, | |
| num_virtual_tokens=8, | |
| prompt_tuning_init_text="Classify if the tweet is a complaint or not:", | |
| tokenizer_name_or_path=model_name_or_path, | |
| ) | |
| dataset_name = "twitter_complaints" | |
| checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace( | |
| "/", "_" | |
| ) | |
| text_column = "Tweet text" | |
| label_column = "text_label" | |
| max_length = 64 | |
| lr = 3e-2 | |
| num_epochs = 50 | |
| batch_size = 8 | |
| ``` | |
| ## Load dataset | |
| For this guide, you'll load the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset. This subset contains tweets that are labeled either `complaint` or `no complaint`: | |
| ```py | |
| dataset = load_dataset("ought/raft", dataset_name) | |
| dataset["train"][0] | |
| {"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2} | |
| ``` | |
| To make the `Label` column more readable, replace the `Label` value with the corresponding label text and store them in a `text_label` column. You can use the [`~datasets.Dataset.map`] function to apply this change over the entire dataset in one step: | |
| ```py | |
| classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names] | |
| dataset = dataset.map( | |
| lambda x: {"text_label": [classes[label] for label in x["Label"]]}, | |
| batched=True, | |
| num_proc=1, | |
| ) | |
| {"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2, "text_label": "no complaint"} | |
| ``` | |
| ## Preprocess dataset | |
| Next, you'll setup a tokenizer; configure the appropriate padding token to use for padding sequences, and determine the maximum length of the tokenized labels: | |
| ```py | |
| tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) | |
| if tokenizer.pad_token_id is None: | |
| tokenizer.pad_token_id = tokenizer.eos_token_id | |
| target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes]) | |
| print(target_max_length) | |
| 3 | |
| ``` | |
| Create a `preprocess_function` to: | |
| 1. Tokenize the input text and labels. | |
| 2. For each example in a batch, pad the labels with the tokenizers `pad_token_id`. | |
| 3. Concatenate the input text and labels into the `model_inputs`. | |
| 4. Create a separate attention mask for `labels` and `model_inputs`. | |
| 5. Loop through each example in the batch again to pad the input ids, labels, and attention mask to the `max_length` and convert them to PyTorch tensors. | |
| ```py | |
| def preprocess_function(examples): | |
| batch_size = len(examples[text_column]) | |
| inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]] | |
| targets = [str(x) for x in examples[label_column]] | |
| model_inputs = tokenizer(inputs) | |
| labels = tokenizer(targets) | |
| for i in range(batch_size): | |
| sample_input_ids = model_inputs["input_ids"][i] | |
| label_input_ids = labels["input_ids"][i] + [tokenizer.pad_token_id] | |
| # print(i, sample_input_ids, label_input_ids) | |
| model_inputs["input_ids"][i] = sample_input_ids + label_input_ids | |
| labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids | |
| model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i]) | |
| # print(model_inputs) | |
| for i in range(batch_size): | |
| sample_input_ids = model_inputs["input_ids"][i] | |
| label_input_ids = labels["input_ids"][i] | |
| model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * ( | |
| max_length - len(sample_input_ids) | |
| ) + sample_input_ids | |
| model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[ | |
| "attention_mask" | |
| ][i] | |
| labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids | |
| model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length]) | |
| model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length]) | |
| labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length]) | |
| model_inputs["labels"] = labels["input_ids"] | |
| return model_inputs | |
| ``` | |
| Use the [`~datasets.Dataset.map`] function to apply the `preprocess_function` to the entire dataset. You can remove the unprocessed columns since the model won't need them: | |
| ```py | |
| processed_datasets = dataset.map( | |
| preprocess_function, | |
| batched=True, | |
| num_proc=1, | |
| remove_columns=dataset["train"].column_names, | |
| load_from_cache_file=False, | |
| desc="Running tokenizer on dataset", | |
| ) | |
| ``` | |
| Create a [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) from the `train` and `eval` datasets. Set `pin_memory=True` to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU. | |
| ```py | |
| train_dataset = processed_datasets["train"] | |
| eval_dataset = processed_datasets["train"] | |
| train_dataloader = DataLoader( | |
| train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True | |
| ) | |
| eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True) | |
| ``` | |
| ## Train | |
| You're almost ready to setup your model and start training! | |
| Initialize a base model from [`~transformers.AutoModelForCausalLM`], and pass it and `peft_config` to the [`get_peft_model`] function to create a [`PeftModel`]. You can print the new [`PeftModel`]'s trainable parameters to see how much more efficient it is than training the full parameters of the original model! | |
| ```py | |
| model = AutoModelForCausalLM.from_pretrained(model_name_or_path) | |
| model = get_peft_model(model, peft_config) | |
| print(model.print_trainable_parameters()) | |
| "trainable params: 8192 || all params: 559222784 || trainable%: 0.0014648902430985358" | |
| ``` | |
| Setup an optimizer and learning rate scheduler: | |
| ```py | |
| optimizer = torch.optim.AdamW(model.parameters(), lr=lr) | |
| lr_scheduler = get_linear_schedule_with_warmup( | |
| optimizer=optimizer, | |
| num_warmup_steps=0, | |
| num_training_steps=(len(train_dataloader) * num_epochs), | |
| ) | |
| ``` | |
| Move the model to the GPU, then write a training loop to start training! | |
| ```py | |
| model = model.to(device) | |
| for epoch in range(num_epochs): | |
| model.train() | |
| total_loss = 0 | |
| for step, batch in enumerate(tqdm(train_dataloader)): | |
| batch = {k: v.to(device) for k, v in batch.items()} | |
| outputs = model(**batch) | |
| loss = outputs.loss | |
| total_loss += loss.detach().float() | |
| loss.backward() | |
| optimizer.step() | |
| lr_scheduler.step() | |
| optimizer.zero_grad() | |
| model.eval() | |
| eval_loss = 0 | |
| eval_preds = [] | |
| for step, batch in enumerate(tqdm(eval_dataloader)): | |
| batch = {k: v.to(device) for k, v in batch.items()} | |
| with torch.no_grad(): | |
| outputs = model(**batch) | |
| loss = outputs.loss | |
| eval_loss += loss.detach().float() | |
| eval_preds.extend( | |
| tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True) | |
| ) | |
| eval_epoch_loss = eval_loss / len(eval_dataloader) | |
| eval_ppl = torch.exp(eval_epoch_loss) | |
| train_epoch_loss = total_loss / len(train_dataloader) | |
| train_ppl = torch.exp(train_epoch_loss) | |
| print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}") | |
| ``` | |
| ## Share model | |
| You can store and share your model on the Hub if you'd like. Log in to your Hugging Face account and enter your token when prompted: | |
| ```py | |
| from huggingface_hub import notebook_login | |
| notebook_login() | |
| ``` | |
| Use the [`~transformers.PreTrainedModel.push_to_hub`] function to upload your model to a model repository on the Hub: | |
| ```py | |
| peft_model_id = "your-name/bloomz-560m_PROMPT_TUNING_CAUSAL_LM" | |
| model.push_to_hub("your-name/bloomz-560m_PROMPT_TUNING_CAUSAL_LM", use_auth_token=True) | |
| ``` | |
| Once the model is uploaded, you'll see the model file size is only 33.5kB! 🤏 | |
| ## Inference | |
| Let's try the model on a sample input for inference. If you look at the repository you uploaded the model to, you'll see a `adapter_config.json` file. Load this file into [`PeftConfig`] to specify the `peft_type` and `task_type`. Then you can load the prompt tuned model weights, and the configuration into [`~PeftModel.from_pretrained`] to create the [`PeftModel`]: | |
| ```py | |
| from peft import PeftModel, PeftConfig | |
| peft_model_id = "stevhliu/bloomz-560m_PROMPT_TUNING_CAUSAL_LM" | |
| config = PeftConfig.from_pretrained(peft_model_id) | |
| model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path) | |
| model = PeftModel.from_pretrained(model, peft_model_id) | |
| ``` | |
| Grab a tweet and tokenize it: | |
| ```py | |
| inputs = tokenizer( | |
| f'{text_column} : {"@nationalgridus I have no water and the bill is current and paid. Can you do something about this?"} Label : ', | |
| return_tensors="pt", | |
| ) | |
| ``` | |
| Put the model on a GPU and *generate* the predicted label: | |
| ```py | |
| model.to(device) | |
| with torch.no_grad(): | |
| inputs = {k: v.to(device) for k, v in inputs.items()} | |
| outputs = model.generate( | |
| input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3 | |
| ) | |
| print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)) | |
| [ | |
| "Tweet text : @nationalgridus I have no water and the bill is current and paid. Can you do something about this? Label : complaint" | |
| ] | |
| ``` |