pythia-helpful-1epoch

lomahony 's Collections

updated Mar 12, 2024

Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.