Commit
·
a7c87ea
1
Parent(s):
65566d0
Update model family table
Browse files
README.md
CHANGED
|
@@ -671,95 +671,89 @@ model-index:
|
|
| 671 |
- **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co)
|
| 672 |
- **Languages:** Refer to [BLOOM](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
|
| 673 |
- **BLOOMZ & mT0 Model Family:**
|
| 674 |
-
|
| 675 |
<table>
|
| 676 |
<tr>
|
| 677 |
-
<th colspan="12">Multitask finetuned on
|
| 678 |
</tr>
|
| 679 |
<tr>
|
| 680 |
-
<
|
| 681 |
-
<td>
|
| 682 |
-
<td>
|
| 683 |
-
<td>
|
| 684 |
-
<td>
|
| 685 |
-
<td>
|
| 686 |
-
<td>560M</td>
|
| 687 |
-
<td>
|
| 688 |
-
<td>
|
| 689 |
-
<td>
|
| 690 |
-
<td>
|
| 691 |
-
<td>
|
| 692 |
</tr>
|
| 693 |
<tr>
|
| 694 |
-
<
|
| 695 |
-
<td>
|
| 696 |
-
<td>
|
| 697 |
-
<td>
|
| 698 |
-
<td>
|
| 699 |
-
<td>
|
| 700 |
-
<td>
|
| 701 |
-
<td>
|
| 702 |
-
<td>
|
| 703 |
-
<td>
|
| 704 |
-
<td>
|
| 705 |
-
<td>
|
| 706 |
</tr>
|
|
|
|
|
|
|
|
|
|
| 707 |
</tr>
|
| 708 |
<tr>
|
| 709 |
-
<
|
| 710 |
-
<td
|
| 711 |
-
<td
|
| 712 |
-
<td
|
| 713 |
-
<td
|
| 714 |
-
<td>
|
| 715 |
-
<td
|
| 716 |
-
<td
|
| 717 |
-
<td
|
| 718 |
-
<td
|
| 719 |
-
<td>
|
| 720 |
-
<td>
|
| 721 |
</tr>
|
| 722 |
-
|
| 723 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 724 |
</tr>
|
| 725 |
</table>
|
| 726 |
|
| 727 |
-
<table>
|
| 728 |
-
<tr>
|
| 729 |
-
<td>One</td>
|
| 730 |
-
<td>Two</td>
|
| 731 |
-
</tr>
|
| 732 |
-
<tr>
|
| 733 |
-
<td colspan="2">Three</td>
|
| 734 |
-
</tr>
|
| 735 |
-
</table>
|
| 736 |
-
|
| 737 |
-
|
| 738 |
-
|Name|Explanation|
|
| 739 |
-
|----|-----------|
|
| 740 |
-
|[bloomz-560m](https://huggingface.co/bigscience/bloomz-560m)| 560M parameter multitask finetuned version of [bloom-560m](https://huggingface.co/bigscience/bloom-560m) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 741 |
-
|[bloomz-1b1](https://huggingface.co/bigscience/bloomz-1b1)| 1.1B parameter multitask finetuned version of [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 742 |
-
|[bloomz-1b7](https://huggingface.co/bigscience/bloomz-1b7)| 1.7B parameter multitask finetuned version of [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 743 |
-
|[bloomz-3b](https://huggingface.co/bigscience/bloomz-3b)| 3B parameter multitask finetuned version of [bloom-3b](https://huggingface.co/bigscience/bloom-3b) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 744 |
-
|[bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 745 |
-
|[bloomz](https://huggingface.co/bigscience/bloomz)|176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 746 |
-
|||
|
| 747 |
-
|[bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/datasets/xP3mt). **Better than [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) when prompting in non-English**|
|
| 748 |
-
|[bloomz-mt](https://huggingface.co/bigscience/bloomz-mt)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/datasets/xP3mt). **Better than [bloomz](https://huggingface.co/bigscience/bloomz) when prompting in non-English**|
|
| 749 |
-
|||
|
| 750 |
-
|[bloomz-7b1-p3](https://huggingface.co/bigscience/bloomz-7b1)| 7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)**|
|
| 751 |
-
|[bloomz-p3](https://huggingface.co/bigscience/bloomz)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz](https://huggingface.co/bigscience/bloomz)**|
|
| 752 |
-
|||
|
| 753 |
-
|||
|
| 754 |
-
|[mt0-small](https://huggingface.co/bigscience/mt0-xxl)|300M parameter multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 755 |
-
|[mt0-base](https://huggingface.co/bigscience/mt0-xxl)|580M parameter multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 756 |
-
|[mt0-large](https://huggingface.co/bigscience/mt0-xxl)|1.2B parameter multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 757 |
-
|[mt0-xl](https://huggingface.co/bigscience/mt0-xxl)|3.7B parameter multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 758 |
-
|[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/datasets/bigscience/xP3)|
|
| 759 |
-
|||
|
| 760 |
-
|[mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/datasets/bigscience/xP3) & [xP3mt](https://huggingface.co/datasets/bigscience/xP3mt). **Better than [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) when prompting in non-English**|
|
| 761 |
-
|||
|
| 762 |
-
|[mt0-xxl-p3](https://huggingface.co/bigscience/mt0-xxl-p3)| 13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/datasets/bigscience/P3). **Released for research purposes, performance is inferior to [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)**|
|
| 763 |
|
| 764 |
# Use
|
| 765 |
|
|
|
|
| 671 |
- **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co)
|
| 672 |
- **Languages:** Refer to [BLOOM](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
|
| 673 |
- **BLOOMZ & mT0 Model Family:**
|
|
|
|
| 674 |
<table>
|
| 675 |
<tr>
|
| 676 |
+
<th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/xP3>xP3</a>. Recommended for prompting in English.
|
| 677 |
</tr>
|
| 678 |
<tr>
|
| 679 |
+
<td>Parameters</td>
|
| 680 |
+
<td>300M</td>
|
| 681 |
+
<td>580M</td>
|
| 682 |
+
<td>1.2B</td>
|
| 683 |
+
<td>3.7B</td>
|
| 684 |
+
<td>13B</td>
|
| 685 |
+
<td>560M</td>
|
| 686 |
+
<td>1.1B</td>
|
| 687 |
+
<td>1.7B</td>
|
| 688 |
+
<td>3B</td>
|
| 689 |
+
<td>7.1B</td>
|
| 690 |
+
<td>176B</td>
|
| 691 |
</tr>
|
| 692 |
<tr>
|
| 693 |
+
<td>Finetuned Model</td>
|
| 694 |
+
<td><a href=https://huggingface.co/bigscience/mt0-base>mt0-base</a></td>
|
| 695 |
+
<td><a href=https://huggingface.co/bigscience/mt0-small>mt0-small</a></td>
|
| 696 |
+
<td><a href=https://huggingface.co/bigscience/mt0-large>mt0-large</a></td>
|
| 697 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xl>mt0-xl</a></td>
|
| 698 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xxl>mt0-xxl</a></td>
|
| 699 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-560m>bloomz-560m</a></td>
|
| 700 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-1b1>bloomz-1b1</a></td>
|
| 701 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-1b7>bloomz-1b7</a></td>
|
| 702 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-3b>bloomz-3b</a></td>
|
| 703 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-7b1>bloomz-7b1</a></td>
|
| 704 |
+
<td><a href=https://huggingface.co/bigscience/bloomz>bloomz</a></td>
|
| 705 |
</tr>
|
| 706 |
+
</tr>
|
| 707 |
+
<tr>
|
| 708 |
+
<th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/xP3mt>xP3mt</a>. Recommended for prompting in non-English.</th>
|
| 709 |
</tr>
|
| 710 |
<tr>
|
| 711 |
+
<td>Finetuned Model</td>
|
| 712 |
+
<td></td>
|
| 713 |
+
<td></td>
|
| 714 |
+
<td></td>
|
| 715 |
+
<td></td>
|
| 716 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xxl-mt>mt0-xxl-mt</a></td>
|
| 717 |
+
<td></td>
|
| 718 |
+
<td></td>
|
| 719 |
+
<td></td>
|
| 720 |
+
<td></td>
|
| 721 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-7b1-mt>bloomz-7b1-mt</a></td>
|
| 722 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-mt>bloomz-mt</a></td>
|
| 723 |
</tr>
|
| 724 |
+
<th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/bigscience/P3>P3</a>. Released for research purposes only. Strictly inferior to above models!</th>
|
| 725 |
+
</tr>
|
| 726 |
+
<tr>
|
| 727 |
+
<td>Finetuned Model</td>
|
| 728 |
+
<td></td>
|
| 729 |
+
<td></td>
|
| 730 |
+
<td></td>
|
| 731 |
+
<td></td>
|
| 732 |
+
<td><a href=https://huggingface.co/bigscience/mt0-xxl-p3>mt0-xxl-p3</a></td>
|
| 733 |
+
<td></td>
|
| 734 |
+
<td></td>
|
| 735 |
+
<td></td>
|
| 736 |
+
<td></td>
|
| 737 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-7b1-p3>bloomz-7b1-p3</a></td>
|
| 738 |
+
<td><a href=https://huggingface.co/bigscience/bloomz-p3>bloomz-p3</a></td>
|
| 739 |
+
</tr>
|
| 740 |
+
<th colspan="12">Original pretrained checkpoints. Not recommended.</th>
|
| 741 |
+
<tr>
|
| 742 |
+
<td>Pretrained Model</td>
|
| 743 |
+
<td><a href=https://huggingface.co/bigscience/mt5-base>mt5-base</a></td>
|
| 744 |
+
<td><a href=https://huggingface.co/bigscience/mt5-small>mt5-small</a></td>
|
| 745 |
+
<td><a href=https://huggingface.co/bigscience/mt5-large>mt5-large</a></td>
|
| 746 |
+
<td><a href=https://huggingface.co/bigscience/mt5-xl>mt5-xl</a></td>
|
| 747 |
+
<td><a href=https://huggingface.co/bigscience/mt5-xxl>mt5-xxl</a></td>
|
| 748 |
+
<td><a href=https://huggingface.co/bigscience/bloom-560m>bloom-560m</a></td>
|
| 749 |
+
<td><a href=https://huggingface.co/bigscience/bloom-1b1>bloom-1b1</a></td>
|
| 750 |
+
<td><a href=https://huggingface.co/bigscience/bloom-1b7>bloom-1b7</a></td>
|
| 751 |
+
<td><a href=https://huggingface.co/bigscience/bloom-3b>bloom-3b</a></td>
|
| 752 |
+
<td><a href=https://huggingface.co/bigscience/bloom-7b1>bloom-7b1</a></td>
|
| 753 |
+
<td><a href=https://huggingface.co/bigscience/bloom>bloom</a></td>
|
| 754 |
</tr>
|
| 755 |
</table>
|
| 756 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 757 |
|
| 758 |
# Use
|
| 759 |
|