Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,9 +9,36 @@ pinned: false
|
|
| 9 |
|
| 10 |
# Hugging Face Research
|
| 11 |
|
| 12 |
-
The science team at Hugging Face is dedicated to advancing machine learning research in ways that maximize value for the whole community.
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
<!---
|
| 17 |
TIMELINE UPDATE INSTRUCTIONS:
|
|
@@ -368,31 +395,6 @@ TIMELINE UPDATE INSTRUCTIONS:
|
|
| 368 |
Go to https://huggingface.co/lvwerra/science-timeline and follow the guide to update the timeline.
|
| 369 |
-->
|
| 370 |
|
| 371 |
-
|
| 372 |
-
### π οΈ Tooling & Infrastructure
|
| 373 |
-
|
| 374 |
-
The foundation of ML research is tooling and infrastructure and we are working on a range of tools such as [datatrove](www.github.com/huggingface/datatrove), [nanotron](www.github.com/huggingface/nanotron), [TRL](www.github.com/huggingface/trl), [LeRobot](www.github.com/huggingface/lerobot), and [lighteval](www.github.com/huggingface/lighteval).
|
| 375 |
-
|
| 376 |
-
### π Datasets
|
| 377 |
-
|
| 378 |
-
High quality datasets are the powerhouse of LLMs and require special care and skills to build. We focus on building high-quality datasets such as [no-robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack-v2), and [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo).
|
| 379 |
-
|
| 380 |
-
### π€ Open Models
|
| 381 |
-
|
| 382 |
-
The datatsets and training recipes of most state-of-the-art models are not released. We build cutting-edge models and release the full training pipeline as well fostering more innovation and reproducibility, such as [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), or [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct).
|
| 383 |
-
|
| 384 |
-
### πΈ Collaborations
|
| 385 |
-
|
| 386 |
-
Research and collaboration go hand in hand. That's why we like to organize and participate in large open collaborations such as [BigScience](https://bigscience.huggingface.co) and [BigCode](https://www.bigcode-project.org), as well as lots of smaller partnerships such as [Leaderboards on the Hub](https://huggingface.co/blog?tag=leaderboard).
|
| 387 |
-
|
| 388 |
-
### βοΈ Infrastructre
|
| 389 |
-
|
| 390 |
-
The research team is organized in small teams with typically <4 people and the science cluster consists of 96 x 8xH100 nodes as well as an auto-scalable CPU cluster for dataset processing. In this setup, even a small research team can build and push out impactful artifacts.
|
| 391 |
-
|
| 392 |
-
### π Educational material
|
| 393 |
-
|
| 394 |
-
Besides writing tech reports of research projects we also like to write more educational content to help newcomers get started to the field or practitioners. We built for example the [alignment handbook](https://github.com/huggingface/alignment-handbook), the [evaluation guidebook](https://github.com/huggingface/evaluation-guidebook), the [pretraining tutorial](https://www.youtube.com/watch?v=2-SPH9hIKT8), or the [FineWeb blog](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
|
| 395 |
-
|
| 396 |
### π€ Join us!
|
| 397 |
|
| 398 |
We are actively hiring for both full-time and internships. Check out [hf.co/jobs](https://hf.co/jobs)
|
|
|
|
| 9 |
|
| 10 |
# Hugging Face Research
|
| 11 |
|
| 12 |
+
The science team at Hugging Face is dedicated to advancing machine learning research in ways that maximize value for the whole community.
|
| 13 |
|
| 14 |
+
### π οΈ Tooling & Infrastructure
|
| 15 |
+
|
| 16 |
+
The foundation of ML research is tooling and infrastructure and we are working on a range of tools such as [datatrove](www.github.com/huggingface/datatrove), [nanotron](www.github.com/huggingface/nanotron), [TRL](www.github.com/huggingface/trl), [LeRobot](www.github.com/huggingface/lerobot), and [lighteval](www.github.com/huggingface/lighteval).
|
| 17 |
+
|
| 18 |
+
### π Datasets
|
| 19 |
+
|
| 20 |
+
High quality datasets are the powerhouse of LLMs and require special care and skills to build. We focus on building high-quality datasets such as [no-robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack-v2), and [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo).
|
| 21 |
+
|
| 22 |
+
### π€ Open Models
|
| 23 |
+
|
| 24 |
+
The datatsets and training recipes of most state-of-the-art models are not released. We build cutting-edge models and release the full training pipeline as well fostering more innovation and reproducibility, such as [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), or [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct).
|
| 25 |
+
|
| 26 |
+
### πΈ Collaborations
|
| 27 |
+
|
| 28 |
+
Research and collaboration go hand in hand. That's why we like to organize and participate in large open collaborations such as [BigScience](https://bigscience.huggingface.co) and [BigCode](https://www.bigcode-project.org), as well as lots of smaller partnerships such as [Leaderboards on the Hub](https://huggingface.co/blog?tag=leaderboard).
|
| 29 |
+
|
| 30 |
+
### βοΈ Infrastructre
|
| 31 |
+
|
| 32 |
+
The research team is organized in small teams with typically <4 people and the science cluster consists of 96 x 8xH100 nodes as well as an auto-scalable CPU cluster for dataset processing. In this setup, even a small research team can build and push out impactful artifacts.
|
| 33 |
+
|
| 34 |
+
### π Educational material
|
| 35 |
+
|
| 36 |
+
Besides writing tech reports of research projects we also like to write more educational content to help newcomers get started to the field or practitioners. We built for example the [alignment handbook](https://github.com/huggingface/alignment-handbook), the [evaluation guidebook](https://github.com/huggingface/evaluation-guidebook), the [pretraining tutorial](https://www.youtube.com/watch?v=2-SPH9hIKT8), or the [FineWeb blog](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
### Release Timeline
|
| 40 |
+
|
| 41 |
+
This is the release timeline so far and follow the links by clicking on the elements:
|
| 42 |
|
| 43 |
<!---
|
| 44 |
TIMELINE UPDATE INSTRUCTIONS:
|
|
|
|
| 395 |
Go to https://huggingface.co/lvwerra/science-timeline and follow the guide to update the timeline.
|
| 396 |
-->
|
| 397 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 398 |
### π€ Join us!
|
| 399 |
|
| 400 |
We are actively hiring for both full-time and internships. Check out [hf.co/jobs](https://hf.co/jobs)
|