TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-tldr-preference-n_epochs1-bs16-rsreduced_weight Updated Sep 27
TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-tldr-preference-n_epochs1-bs16-rsstatic Updated Sep 25
TrandeLik/base_rt-qwen-qwen2.5-7b-instruct-mnoukhov-openai_summarize_comparisons_tldrprompt-n_epochs1-bs16 Updated Sep 12
TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs16-rsstatic Updated Sep 11
TrandeLik/aug_rt-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs16-rsnone Updated Aug 21
TrandeLik/augmentedrewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs16 Updated Aug 10
TrandeLik/vanilarewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs32 Updated Aug 10
TrandeLik/vanilarewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs2-bs8 Updated Aug 9
TrandeLik/augmentedrewardtrainer-qwen-qwen2.5-7b-instruct-trl-lib-ultrafeedback_binarized-n_epochs1-bs8 Updated Aug 9