pythia-helpful-1epoch
					Collection
				
Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
					• 
				12 items
				• 
				Updated
					
				
Pythia-1b supervised finetuned using TRLx library with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
Checkpoints are also uploaded.
Fully reproducible finetuning code is available on GitHub
See Pythia-1b for model details (paper).
See further details of these models in the paper Attributing Mode Collapse in the Fine-Tuning of Large Language Models.
You can cite these models if they are helpful as follows:
@inproceedings{o2024attributing,
  title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
  author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
  booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
  year={2024}
}
hf (pretrained=lomahony/pythia-1b-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | 0.2543 | ± | 0.0127 | 
| none | 0 | acc_norm | 0.2739 | ± | 0.0130 | ||
| arc_easy | 1 | none | 0 | acc | 0.5724 | ± | 0.0102 | 
| none | 0 | acc_norm | 0.4941 | ± | 0.0103 | ||
| boolq | 2 | none | 0 | acc | 0.6199 | ± | 0.0085 | 
| hellaswag | 1 | none | 0 | acc | 0.3819 | ± | 0.0048 | 
| none | 0 | acc_norm | 0.4736 | ± | 0.0050 | ||
| lambada_openai | 1 | none | 0 | perplexity | 7.1374 | ± | 0.2014 | 
| none | 0 | acc | 0.5626 | ± | 0.0069 | ||
| openbookqa | 1 | none | 0 | acc | 0.2040 | ± | 0.0180 | 
| none | 0 | acc_norm | 0.3140 | ± | 0.0208 | ||
| piqa | 1 | none | 0 | acc | 0.7138 | ± | 0.0105 | 
| none | 0 | acc_norm | 0.6997 | ± | 0.0107 | ||
| sciq | 1 | none | 0 | acc | 0.8400 | ± | 0.0116 | 
| none | 0 | acc_norm | 0.7620 | ± | 0.0135 | ||
| wikitext | 2 | none | 0 | word_perplexity | 16.9719 | ± | N/A | 
| none | 0 | byte_perplexity | 1.6981 | ± | N/A | ||
| none | 0 | bits_per_byte | 0.7639 | ± | N/A | ||
| winogrande | 1 | none | 0 | acc | 0.5343 | ± | 0.0140 | 
hf (pretrained=lomahony/pythia-1b-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 5 | acc | 0.2628 | ± | 0.0129 | 
| none | 5 | acc_norm | 0.2918 | ± | 0.0133 | ||
| arc_easy | 1 | none | 5 | acc | 0.6040 | ± | 0.0100 | 
| none | 5 | acc_norm | 0.5816 | ± | 0.0101 | ||
| boolq | 2 | none | 5 | acc | 0.5963 | ± | 0.0086 | 
| hellaswag | 1 | none | 5 | acc | 0.3780 | ± | 0.0048 | 
| none | 5 | acc_norm | 0.4719 | ± | 0.0050 | ||
| lambada_openai | 1 | none | 5 | perplexity | 10.2584 | ± | 0.2936 | 
| none | 5 | acc | 0.4832 | ± | 0.0070 | ||
| openbookqa | 1 | none | 5 | acc | 0.1980 | ± | 0.0178 | 
| none | 5 | acc_norm | 0.3220 | ± | 0.0209 | ||
| piqa | 1 | none | 5 | acc | 0.7057 | ± | 0.0106 | 
| none | 5 | acc_norm | 0.7095 | ± | 0.0106 | ||
| sciq | 1 | none | 5 | acc | 0.8980 | ± | 0.0096 | 
| none | 5 | acc_norm | 0.9000 | ± | 0.0095 | ||
| wikitext | 2 | none | 5 | word_perplexity | 16.9719 | ± | N/A | 
| none | 5 | byte_perplexity | 1.6981 | ± | N/A | ||
| none | 5 | bits_per_byte | 0.7639 | ± | N/A | ||
| winogrande | 1 | none | 5 | acc | 0.5446 | ± | 0.0140 |