mrm8488
/

phi-2-coder

Text Generation

Generated from Trainer

Model card Files Files and versions

mrm8488 commited on Dec 24, 2023

Commit

228fc22

·

1 Parent(s): 6ac4e60

Update README.md

Files changed (1) hide show

README.md +37 -44

README.md CHANGED Viewed

@@ -40,50 +40,43 @@ Phi-2 is a Transformer with **2.7 billion** parameters. It was trained using the
-### LoRa config
-```py
-config = LoraConfig(
-    r=32,
-    lora_alpha=64,
-    target_modules=[
-        "Wqkv",
-        "fc1",
-        "fc2",
-        "out_proj"
-    ],
-    bias="none",
-    lora_dropout=0.05,
-    task_type="CAUSAL_LM",
-)
-```
-### Training hyperparameters ⚙
-```py
-per_device_train_batch_size=4,
-gradient_accumulation_steps=32,
-num_train_epochs=2,
-learning_rate=2.5e-5,
-optim="paged_adamw_8bit",
-seed=66,
-load_best_model_at_end=True,
-save_strategy="steps",
-save_steps=50,
-evaluation_strategy="steps",
-eval_steps=50,
-```
-### Training results 🗒️
-| Step | Training Loss | Validation Loss |
-|------|---------------|-----------------|
-| 50   | 0.763100      | 0.717398        |
-| 100  | 0.673500      | 0.694871        |
-| 150  | 0.696000      | 0.689336        |
-| 200  | 0.786100      | 0.687515        |
-| 250  | 0.734600      | 0.686658        |

+### Training procedure
+The following `bitsandbytes` quantization config was used during training:
+- quant_method: bitsandbytes
+- load_in_8bit: True
+- load_in_4bit: False
+- llm_int8_threshold: 6.0
+- llm_int8_skip_modules: None
+- llm_int8_enable_fp32_cpu_offload: False
+- llm_int8_has_fp16_weight: False
+- bnb_4bit_quant_type: fp4
+- bnb_4bit_use_double_quant: False
+- bnb_4bit_compute_dtype: float32
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2.5e-05
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 66
+- gradient_accumulation_steps: 32
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.7631        | 0.36  | 50   | 0.7174          |
+| 0.6735        | 0.71  | 100  | 0.6949          |
+| 0.696         | 1.07  | 150  | 0.6893          |
+| 0.7861        | 1.42  | 200  | 0.6875          |
+| 0.7346        | 1.78  | 250  | 0.6867          |