End of training
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -2,9 +2,13 @@ 
     | 
|
| 2 | 
         
             
            license: apache-2.0
         
     | 
| 3 | 
         
             
            library_name: peft
         
     | 
| 4 | 
         
             
            tags:
         
     | 
| 
         | 
|
| 
         | 
|
| 5 | 
         
             
            - trl
         
     | 
| 6 | 
         
             
            - dpo
         
     | 
| 7 | 
         
             
            - generated_from_trainer
         
     | 
| 
         | 
|
| 
         | 
|
| 8 | 
         
             
            base_model: mistralai/Mistral-7B-v0.1
         
     | 
| 9 | 
         
             
            model-index:
         
     | 
| 10 | 
         
             
            - name: zephyr-7b-dpo-qlora
         
     | 
| 
         @@ -16,7 +20,7 @@ should probably proofread and complete it, then remove this comment. --> 
     | 
|
| 16 | 
         | 
| 17 | 
         
             
            # zephyr-7b-dpo-qlora
         
     | 
| 18 | 
         | 
| 19 | 
         
            -
            This model is a fine-tuned version of [ 
     | 
| 20 | 
         
             
            It achieves the following results on the evaluation set:
         
     | 
| 21 | 
         
             
            - Loss: 0.5473
         
     | 
| 22 | 
         
             
            - Rewards/chosen: -0.8609
         
     | 
| 
         | 
|
| 2 | 
         
             
            license: apache-2.0
         
     | 
| 3 | 
         
             
            library_name: peft
         
     | 
| 4 | 
         
             
            tags:
         
     | 
| 5 | 
         
            +
            - alignment-handbook
         
     | 
| 6 | 
         
            +
            - generated_from_trainer
         
     | 
| 7 | 
         
             
            - trl
         
     | 
| 8 | 
         
             
            - dpo
         
     | 
| 9 | 
         
             
            - generated_from_trainer
         
     | 
| 10 | 
         
            +
            datasets:
         
     | 
| 11 | 
         
            +
            - HuggingFaceH4/ultrafeedback_binarized
         
     | 
| 12 | 
         
             
            base_model: mistralai/Mistral-7B-v0.1
         
     | 
| 13 | 
         
             
            model-index:
         
     | 
| 14 | 
         
             
            - name: zephyr-7b-dpo-qlora
         
     | 
| 
         | 
|
| 20 | 
         | 
| 21 | 
         
             
            # zephyr-7b-dpo-qlora
         
     | 
| 22 | 
         | 
| 23 | 
         
            +
            This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
         
     | 
| 24 | 
         
             
            It achieves the following results on the evaluation set:
         
     | 
| 25 | 
         
             
            - Loss: 0.5473
         
     | 
| 26 | 
         
             
            - Rewards/chosen: -0.8609
         
     |