Model save
Browse files- README.md +47 -28
- adapter_model.safetensors +1 -1
- all_results.json +18 -18
- eval_results.json +13 -13
- runs/Jan09_05-07-46_ip-26-0-175-170/events.out.tfevents.1704777003.ip-26-0-175-170.1799139.0 +2 -2
- runs/Jan09_05-07-46_ip-26-0-175-170/events.out.tfevents.1704784713.ip-26-0-175-170.1799139.1 +3 -0
- train_results.json +6 -6
- trainer_state.json +0 -0
    	
        README.md
    CHANGED
    
    | @@ -1,30 +1,32 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            license: apache-2.0
         | 
| 3 | 
            -
             | 
| 4 | 
             
            tags:
         | 
|  | |
|  | |
| 5 | 
             
            - generated_from_trainer
         | 
| 6 | 
            -
             | 
| 7 | 
             
            model-index:
         | 
| 8 | 
            -
            - name: zephyr-7b-dpo- | 
| 9 | 
             
              results: []
         | 
| 10 | 
             
            ---
         | 
| 11 |  | 
| 12 | 
             
            <!-- This model card has been generated automatically according to the information the Trainer had access to. You
         | 
| 13 | 
             
            should probably proofread and complete it, then remove this comment. -->
         | 
| 14 |  | 
| 15 | 
            -
            # zephyr-7b-dpo- | 
| 16 |  | 
| 17 | 
            -
            This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the  | 
| 18 | 
             
            It achieves the following results on the evaluation set:
         | 
| 19 | 
            -
            - Loss: 0. | 
| 20 | 
            -
            - Rewards/chosen: - | 
| 21 | 
            -
            - Rewards/rejected: - | 
| 22 | 
            -
            - Rewards/accuracies: 0. | 
| 23 | 
            -
            - Rewards/margins: 0. | 
| 24 | 
            -
            - Logps/rejected: - | 
| 25 | 
            -
            - Logps/chosen: - | 
| 26 | 
            -
            - Logits/rejected:  | 
| 27 | 
            -
            - Logits/chosen:  | 
| 28 |  | 
| 29 | 
             
            ## Model description
         | 
| 30 |  | 
| @@ -43,31 +45,48 @@ More information needed | |
| 43 | 
             
            ### Training hyperparameters
         | 
| 44 |  | 
| 45 | 
             
            The following hyperparameters were used during training:
         | 
| 46 | 
            -
            - learning_rate: 5e- | 
| 47 | 
            -
            - train_batch_size:  | 
| 48 | 
            -
            - eval_batch_size:  | 
| 49 | 
             
            - seed: 42
         | 
| 50 | 
             
            - distributed_type: multi-GPU
         | 
| 51 | 
            -
            - num_devices:  | 
| 52 | 
            -
            - total_train_batch_size:  | 
| 53 | 
            -
            - total_eval_batch_size:  | 
| 54 | 
             
            - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
         | 
| 55 | 
            -
            - lr_scheduler_type:  | 
| 56 | 
             
            - lr_scheduler_warmup_ratio: 0.1
         | 
| 57 | 
            -
            - num_epochs:  | 
| 58 |  | 
| 59 | 
             
            ### Training results
         | 
| 60 |  | 
| 61 | 
             
            | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
         | 
| 62 | 
             
            |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
         | 
| 63 | 
            -
            | 0. | 
| 64 | 
            -
            | 0. | 
| 65 | 
            -
            | 0. | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 66 |  | 
| 67 |  | 
| 68 | 
             
            ### Framework versions
         | 
| 69 |  | 
| 70 | 
            -
            -  | 
| 71 | 
            -
            -  | 
|  | |
| 72 | 
             
            - Datasets 2.14.6
         | 
| 73 | 
            -
            - Tokenizers 0. | 
|  | |
| 1 | 
             
            ---
         | 
| 2 | 
             
            license: apache-2.0
         | 
| 3 | 
            +
            library_name: peft
         | 
| 4 | 
             
            tags:
         | 
| 5 | 
            +
            - trl
         | 
| 6 | 
            +
            - dpo
         | 
| 7 | 
             
            - generated_from_trainer
         | 
| 8 | 
            +
            base_model: mistralai/Mistral-7B-v0.1
         | 
| 9 | 
             
            model-index:
         | 
| 10 | 
            +
            - name: zephyr-7b-dpo-qlora
         | 
| 11 | 
             
              results: []
         | 
| 12 | 
             
            ---
         | 
| 13 |  | 
| 14 | 
             
            <!-- This model card has been generated automatically according to the information the Trainer had access to. You
         | 
| 15 | 
             
            should probably proofread and complete it, then remove this comment. -->
         | 
| 16 |  | 
| 17 | 
            +
            # zephyr-7b-dpo-qlora
         | 
| 18 |  | 
| 19 | 
            +
            This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
         | 
| 20 | 
             
            It achieves the following results on the evaluation set:
         | 
| 21 | 
            +
            - Loss: 0.5325
         | 
| 22 | 
            +
            - Rewards/chosen: -1.2325
         | 
| 23 | 
            +
            - Rewards/rejected: -2.0565
         | 
| 24 | 
            +
            - Rewards/accuracies: 0.7656
         | 
| 25 | 
            +
            - Rewards/margins: 0.8240
         | 
| 26 | 
            +
            - Logps/rejected: -457.4398
         | 
| 27 | 
            +
            - Logps/chosen: -373.4022
         | 
| 28 | 
            +
            - Logits/rejected: 0.7596
         | 
| 29 | 
            +
            - Logits/chosen: 0.5001
         | 
| 30 |  | 
| 31 | 
             
            ## Model description
         | 
| 32 |  | 
|  | |
| 45 | 
             
            ### Training hyperparameters
         | 
| 46 |  | 
| 47 | 
             
            The following hyperparameters were used during training:
         | 
| 48 | 
            +
            - learning_rate: 5e-06
         | 
| 49 | 
            +
            - train_batch_size: 4
         | 
| 50 | 
            +
            - eval_batch_size: 8
         | 
| 51 | 
             
            - seed: 42
         | 
| 52 | 
             
            - distributed_type: multi-GPU
         | 
| 53 | 
            +
            - num_devices: 8
         | 
| 54 | 
            +
            - total_train_batch_size: 32
         | 
| 55 | 
            +
            - total_eval_batch_size: 64
         | 
| 56 | 
             
            - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
         | 
| 57 | 
            +
            - lr_scheduler_type: cosine
         | 
| 58 | 
             
            - lr_scheduler_warmup_ratio: 0.1
         | 
| 59 | 
            +
            - num_epochs: 1
         | 
| 60 |  | 
| 61 | 
             
            ### Training results
         | 
| 62 |  | 
| 63 | 
             
            | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
         | 
| 64 | 
             
            |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
         | 
| 65 | 
            +
            | 0.6916        | 0.05  | 100  | 0.6912          | 0.0059         | 0.0019           | 0.6484             | 0.0041          | -251.6075      | -249.5596    | -2.2040         | -2.2621       |
         | 
| 66 | 
            +
            | 0.655         | 0.1   | 200  | 0.6498          | -0.0559        | -0.1762          | 0.7070             | 0.1203          | -269.4106      | -255.7421    | -2.1011         | -2.1614       |
         | 
| 67 | 
            +
            | 0.6342        | 0.16  | 300  | 0.6146          | -0.3407        | -0.6269          | 0.7031             | 0.2862          | -314.4839      | -284.2224    | -1.9037         | -1.9793       |
         | 
| 68 | 
            +
            | 0.6121        | 0.21  | 400  | 0.5946          | -0.4657        | -0.8916          | 0.7031             | 0.4259          | -340.9551      | -296.7203    | -1.8717         | -1.9543       |
         | 
| 69 | 
            +
            | 0.5973        | 0.26  | 500  | 0.5938          | -0.3681        | -0.7766          | 0.7305             | 0.4085          | -329.4522      | -286.9666    | -1.8440         | -1.9282       |
         | 
| 70 | 
            +
            | 0.5473        | 0.31  | 600  | 0.5774          | -0.6893        | -1.2264          | 0.7344             | 0.5371          | -374.4341      | -319.0812    | -1.6815         | -1.7726       |
         | 
| 71 | 
            +
            | 0.5792        | 0.37  | 700  | 0.5709          | -0.6635        | -1.2100          | 0.7578             | 0.5465          | -372.7989      | -316.5072    | -1.4783         | -1.5775       |
         | 
| 72 | 
            +
            | 0.5194        | 0.42  | 800  | 0.5590          | -1.0208        | -1.6453          | 0.7461             | 0.6245          | -416.3269      | -352.2357    | -0.3791         | -0.5486       |
         | 
| 73 | 
            +
            | 0.5367        | 0.47  | 900  | 0.5492          | -1.1477        | -1.8521          | 0.7266             | 0.7044          | -437.0040      | -364.9276    | -0.0908         | -0.2899       |
         | 
| 74 | 
            +
            | 0.5575        | 0.52  | 1000 | 0.5450          | -1.1704        | -1.9048          | 0.7344             | 0.7344          | -442.2755      | -367.1964    | 0.2761          | 0.0498        |
         | 
| 75 | 
            +
            | 0.5507        | 0.58  | 1100 | 0.5429          | -1.1040        | -1.8671          | 0.7422             | 0.7631          | -438.5026      | -360.5551    | 0.5339          | 0.2877        |
         | 
| 76 | 
            +
            | 0.5305        | 0.63  | 1200 | 0.5366          | -1.1557        | -1.9243          | 0.7578             | 0.7686          | -444.2217      | -365.7241    | 0.7350          | 0.4755        |
         | 
| 77 | 
            +
            | 0.5171        | 0.68  | 1300 | 0.5304          | -1.3741        | -2.1678          | 0.7656             | 0.7937          | -468.5735      | -387.5681    | 0.7686          | 0.5029        |
         | 
| 78 | 
            +
            | 0.4875        | 0.73  | 1400 | 0.5321          | -1.3228        | -2.1513          | 0.7578             | 0.8285          | -466.9267      | -382.4329    | 0.8566          | 0.5926        |
         | 
| 79 | 
            +
            | 0.5216        | 0.78  | 1500 | 0.5326          | -1.2006        | -2.0034          | 0.7617             | 0.8028          | -452.1298      | -370.2103    | 0.7189          | 0.4630        |
         | 
| 80 | 
            +
            | 0.4894        | 0.84  | 1600 | 0.5327          | -1.2300        | -2.0556          | 0.7656             | 0.8256          | -457.3565      | -373.1585    | 0.7405          | 0.4828        |
         | 
| 81 | 
            +
            | 0.5179        | 0.89  | 1700 | 0.5326          | -1.2313        | -2.0558          | 0.7656             | 0.8245          | -457.3720      | -373.2860    | 0.7604          | 0.5012        |
         | 
| 82 | 
            +
            | 0.5534        | 0.94  | 1800 | 0.5325          | -1.2309        | -2.0558          | 0.7656             | 0.8249          | -457.3779      | -373.2437    | 0.7550          | 0.4957        |
         | 
| 83 | 
            +
            | 0.5539        | 0.99  | 1900 | 0.5325          | -1.2325        | -2.0565          | 0.7656             | 0.8240          | -457.4398      | -373.4022    | 0.7596          | 0.5001        |
         | 
| 84 |  | 
| 85 |  | 
| 86 | 
             
            ### Framework versions
         | 
| 87 |  | 
| 88 | 
            +
            - PEFT 0.7.1
         | 
| 89 | 
            +
            - Transformers 4.36.2
         | 
| 90 | 
            +
            - Pytorch 2.1.2+cu121
         | 
| 91 | 
             
            - Datasets 2.14.6
         | 
| 92 | 
            +
            - Tokenizers 0.15.0
         | 
    	
        adapter_model.safetensors
    CHANGED
    
    | @@ -1,3 +1,3 @@ | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            -
            oid sha256: | 
| 3 | 
             
            size 83945744
         | 
|  | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:881e1b5a4dd0347641273b3dcdd5ce52a7e613d1712bb56b80cc13e114765f7c
         | 
| 3 | 
             
            size 83945744
         | 
    	
        all_results.json
    CHANGED
    
    | @@ -1,21 +1,21 @@ | |
| 1 | 
             
            {
         | 
| 2 | 
            -
                "epoch":  | 
| 3 | 
            -
                "eval_logits/chosen":  | 
| 4 | 
            -
                "eval_logits/rejected":  | 
| 5 | 
            -
                "eval_logps/chosen": - | 
| 6 | 
            -
                "eval_logps/rejected": - | 
| 7 | 
            -
                "eval_loss": 0. | 
| 8 | 
            -
                "eval_rewards/accuracies": 0. | 
| 9 | 
            -
                "eval_rewards/chosen": - | 
| 10 | 
            -
                "eval_rewards/margins": 0. | 
| 11 | 
            -
                "eval_rewards/rejected": - | 
| 12 | 
            -
                "eval_runtime":  | 
| 13 | 
             
                "eval_samples": 2000,
         | 
| 14 | 
            -
                "eval_samples_per_second":  | 
| 15 | 
            -
                "eval_steps_per_second": 0. | 
| 16 | 
            -
                "train_loss": 0. | 
| 17 | 
            -
                "train_runtime":  | 
| 18 | 
            -
                "train_samples":  | 
| 19 | 
            -
                "train_samples_per_second":  | 
| 20 | 
            -
                "train_steps_per_second": 0. | 
| 21 | 
             
            }
         | 
|  | |
| 1 | 
             
            {
         | 
| 2 | 
            +
                "epoch": 1.0,
         | 
| 3 | 
            +
                "eval_logits/chosen": 0.5000983476638794,
         | 
| 4 | 
            +
                "eval_logits/rejected": 0.7595670819282532,
         | 
| 5 | 
            +
                "eval_logps/chosen": -373.40216064453125,
         | 
| 6 | 
            +
                "eval_logps/rejected": -457.4398498535156,
         | 
| 7 | 
            +
                "eval_loss": 0.5325239300727844,
         | 
| 8 | 
            +
                "eval_rewards/accuracies": 0.765625,
         | 
| 9 | 
            +
                "eval_rewards/chosen": -1.2324851751327515,
         | 
| 10 | 
            +
                "eval_rewards/margins": 0.8239741921424866,
         | 
| 11 | 
            +
                "eval_rewards/rejected": -2.056459426879883,
         | 
| 12 | 
            +
                "eval_runtime": 99.4029,
         | 
| 13 | 
             
                "eval_samples": 2000,
         | 
| 14 | 
            +
                "eval_samples_per_second": 20.12,
         | 
| 15 | 
            +
                "eval_steps_per_second": 0.322,
         | 
| 16 | 
            +
                "train_loss": 0.5648497628454511,
         | 
| 17 | 
            +
                "train_runtime": 7610.489,
         | 
| 18 | 
            +
                "train_samples": 61135,
         | 
| 19 | 
            +
                "train_samples_per_second": 8.033,
         | 
| 20 | 
            +
                "train_steps_per_second": 0.251
         | 
| 21 | 
             
            }
         | 
    	
        eval_results.json
    CHANGED
    
    | @@ -1,16 +1,16 @@ | |
| 1 | 
             
            {
         | 
| 2 | 
            -
                "epoch":  | 
| 3 | 
            -
                "eval_logits/chosen":  | 
| 4 | 
            -
                "eval_logits/rejected":  | 
| 5 | 
            -
                "eval_logps/chosen": - | 
| 6 | 
            -
                "eval_logps/rejected": - | 
| 7 | 
            -
                "eval_loss": 0. | 
| 8 | 
            -
                "eval_rewards/accuracies": 0. | 
| 9 | 
            -
                "eval_rewards/chosen": - | 
| 10 | 
            -
                "eval_rewards/margins": 0. | 
| 11 | 
            -
                "eval_rewards/rejected": - | 
| 12 | 
            -
                "eval_runtime":  | 
| 13 | 
             
                "eval_samples": 2000,
         | 
| 14 | 
            -
                "eval_samples_per_second":  | 
| 15 | 
            -
                "eval_steps_per_second": 0. | 
| 16 | 
             
            }
         | 
|  | |
| 1 | 
             
            {
         | 
| 2 | 
            +
                "epoch": 1.0,
         | 
| 3 | 
            +
                "eval_logits/chosen": 0.5000983476638794,
         | 
| 4 | 
            +
                "eval_logits/rejected": 0.7595670819282532,
         | 
| 5 | 
            +
                "eval_logps/chosen": -373.40216064453125,
         | 
| 6 | 
            +
                "eval_logps/rejected": -457.4398498535156,
         | 
| 7 | 
            +
                "eval_loss": 0.5325239300727844,
         | 
| 8 | 
            +
                "eval_rewards/accuracies": 0.765625,
         | 
| 9 | 
            +
                "eval_rewards/chosen": -1.2324851751327515,
         | 
| 10 | 
            +
                "eval_rewards/margins": 0.8239741921424866,
         | 
| 11 | 
            +
                "eval_rewards/rejected": -2.056459426879883,
         | 
| 12 | 
            +
                "eval_runtime": 99.4029,
         | 
| 13 | 
             
                "eval_samples": 2000,
         | 
| 14 | 
            +
                "eval_samples_per_second": 20.12,
         | 
| 15 | 
            +
                "eval_steps_per_second": 0.322
         | 
| 16 | 
             
            }
         | 
    	
        runs/Jan09_05-07-46_ip-26-0-175-170/events.out.tfevents.1704777003.ip-26-0-175-170.1799139.0
    CHANGED
    
    | @@ -1,3 +1,3 @@ | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            -
            oid sha256: | 
| 3 | 
            -
            size  | 
|  | |
| 1 | 
             
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:07ae93d49c31ce3dd08cfe99bd93ac32a5d111827d3113468a1ef52d46c8e90c
         | 
| 3 | 
            +
            size 140544
         | 
    	
        runs/Jan09_05-07-46_ip-26-0-175-170/events.out.tfevents.1704784713.ip-26-0-175-170.1799139.1
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:ea4e75f57b9d81409e2cae26268f15bb6b03b1bfe6164baadf83b6b0a10d5156
         | 
| 3 | 
            +
            size 828
         | 
    	
        train_results.json
    CHANGED
    
    | @@ -1,8 +1,8 @@ | |
| 1 | 
             
            {
         | 
| 2 | 
            -
                "epoch":  | 
| 3 | 
            -
                "train_loss": 0. | 
| 4 | 
            -
                "train_runtime":  | 
| 5 | 
            -
                "train_samples":  | 
| 6 | 
            -
                "train_samples_per_second":  | 
| 7 | 
            -
                "train_steps_per_second": 0. | 
| 8 | 
             
            }
         | 
|  | |
| 1 | 
             
            {
         | 
| 2 | 
            +
                "epoch": 1.0,
         | 
| 3 | 
            +
                "train_loss": 0.5648497628454511,
         | 
| 4 | 
            +
                "train_runtime": 7610.489,
         | 
| 5 | 
            +
                "train_samples": 61135,
         | 
| 6 | 
            +
                "train_samples_per_second": 8.033,
         | 
| 7 | 
            +
                "train_steps_per_second": 0.251
         | 
| 8 | 
             
            }
         | 
    	
        trainer_state.json
    CHANGED
    
    | The diff for this file is too large to render. 
		See raw diff | 
|  | 

