nits
Browse files- content/article.md +19 -17
content/article.md
CHANGED
|
@@ -36,13 +36,15 @@
|
|
| 36 |
|
| 37 |
## Introduction
|
| 38 |
|
| 39 |
-
One million lines of `
|
| 40 |
|
| 41 |
Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
|
| 42 |
|
| 43 |
This scale presents a monumental engineering challenge.
|
| 44 |
|
| 45 |
-
How do you keep such a ship afloat, made of
|
|
|
|
|
|
|
| 46 |
|
| 47 |
This post dissects the design philosophy that makes this possible. It's a continuation of our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently, and I recommend the read if it's not done yet, a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) was written, explaining in particular what makes the library faster today. Again, all of that development was only made possible thanks to these principles.
|
| 48 |
|
|
@@ -72,7 +74,7 @@ Note that the library _evolved_ towards these principles, and that they _emerged
|
|
| 72 |
<li class="tenet">
|
| 73 |
<a id="source-of-truth"></a>
|
| 74 |
<strong>Source of Truth</strong>
|
| 75 |
-
<p>We aim be
|
| 76 |
<em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
|
| 77 |
</li>
|
| 78 |
|
|
@@ -80,18 +82,18 @@ Note that the library _evolved_ towards these principles, and that they _emerged
|
|
| 80 |
<a id="one-model-one-file"></a>
|
| 81 |
<strong>One Model, One File</strong>
|
| 82 |
<p>All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.</p>
|
| 83 |
-
<em>Every model should be
|
| 84 |
</li>
|
| 85 |
<li class="tenet">
|
| 86 |
<a id="code-is-product"></a>
|
| 87 |
-
<strong>Code is Product</strong>
|
| 88 |
-
<p>Optimize for reading,
|
| 89 |
<em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
|
| 90 |
</li>
|
| 91 |
<li class="tenet">
|
| 92 |
<a id="standardize-dont-abstract"></a>
|
| 93 |
<strong>Standardize, Don't Abstract</strong>
|
| 94 |
-
<p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
|
| 95 |
<em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
|
| 96 |
</li>
|
| 97 |
<li class="tenet">
|
|
@@ -104,14 +106,14 @@ Note that the library _evolved_ towards these principles, and that they _emerged
|
|
| 104 |
<li class="tenet">
|
| 105 |
<a id="minimal-user-api"></a>
|
| 106 |
<strong>Minimal User API</strong>
|
| 107 |
-
<p>Config, model,
|
| 108 |
<em>Keep the public interface simple and predictable, users should know what to expect.</em>
|
| 109 |
</li>
|
| 110 |
<li class="tenet">
|
| 111 |
<a id="backwards-compatibility"></a>
|
| 112 |
<strong>Backwards Compatibility</strong>
|
| 113 |
<p>Evolve by additive standardization, never break public APIs.</p>
|
| 114 |
-
<p>Any artifact that was once on the hub and
|
| 115 |
<em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
|
| 116 |
</li>
|
| 117 |
<li class="tenet">
|
|
@@ -142,13 +144,13 @@ You can use a simple regex to look at all methods of a given name across your co
|
|
| 142 |
|
| 143 |
We want all models to have self-contained modeling code.
|
| 144 |
|
| 145 |
-
|
| 146 |
|
| 147 |
This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
|
| 148 |
|
| 149 |
-
We
|
| 150 |
|
| 151 |
-
What
|
| 152 |
|
| 153 |
<div class="crumbs">
|
| 154 |
Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
|
|
@@ -161,7 +163,7 @@ Transformers is an opiniated library. The previous [philosophy](https://huggingf
|
|
| 161 |
|
| 162 |
We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
|
| 163 |
|
| 164 |
-
It works as follows. In order to contribute a model,
|
| 165 |
|
| 166 |
<summary id="generated-modeling">Auto-generated modeling code</summary>
|
| 167 |
|
|
@@ -273,7 +275,7 @@ Sharding is configuration (<code>tp_plan</code>), not edits to <code>Linear</cod
|
|
| 273 |
|
| 274 |
### <a id="layers-attentions-caches"></a> Layers, attentions and caches
|
| 275 |
|
| 276 |
-
Following the same logic, the _nature_ of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we
|
| 277 |
|
| 278 |
|
| 279 |
```python
|
|
@@ -355,7 +357,7 @@ Graph reading guide: nodes are models; edges are modular imports. Llama-lineage
|
|
| 355 |
|
| 356 |
### Many models, but not enough yet, are alike
|
| 357 |
|
| 358 |
-
|
| 359 |
|
| 360 |
It is interesting, for that, to look at _when_ we deployed this modular logic and what was its rippling effect on the library. You can check the [larger space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) to play around, but the gist is: adding modular allowed to connect more and more models to solid reference points. We have a lot of gaps to fill in still.
|
| 361 |
|
|
@@ -455,7 +457,7 @@ This is an overall objective: there's no `transformers` without its community.
|
|
| 455 |
|
| 456 |
Having a framework means forcing users into it. It restrains flexibility and creativity, which are the fertile soil for new ideas to grow.
|
| 457 |
|
| 458 |
-
Among the most valuable contributions to `transformers` is of course the addition of new models.
|
| 459 |
|
| 460 |
A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
|
| 461 |
|
|
@@ -528,7 +530,7 @@ Pre-allocating GPU memory removes malloc spikes (e.g., 7× for 8B, 6× for 32B i
|
|
| 528 |
|
| 529 |
### Transformers-serve and continuous batching
|
| 530 |
|
| 531 |
-
Having all these models readily available allows to use all of them with transformers-serve
|
| 532 |
|
| 533 |
```bash
|
| 534 |
transformers serve
|
|
|
|
| 36 |
|
| 37 |
## Introduction
|
| 38 |
|
| 39 |
+
One million lines of `Python` code. Through them, the [`transformers`](https://github.com/huggingface/transformers) library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
|
| 40 |
|
| 41 |
Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
|
| 42 |
|
| 43 |
This scale presents a monumental engineering challenge.
|
| 44 |
|
| 45 |
+
How do you keep such a ship afloat, made of many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates?
|
| 46 |
+
|
| 47 |
+
We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members. We continue supporting all models that come out and will continue to do so in the foreseeable future.
|
| 48 |
|
| 49 |
This post dissects the design philosophy that makes this possible. It's a continuation of our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently, and I recommend the read if it's not done yet, a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) was written, explaining in particular what makes the library faster today. Again, all of that development was only made possible thanks to these principles.
|
| 50 |
|
|
|
|
| 74 |
<li class="tenet">
|
| 75 |
<a id="source-of-truth"></a>
|
| 76 |
<strong>Source of Truth</strong>
|
| 77 |
+
<p>We aim be the [source of truth for all model definitions](#https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
|
| 78 |
<em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
|
| 79 |
</li>
|
| 80 |
|
|
|
|
| 82 |
<a id="one-model-one-file"></a>
|
| 83 |
<strong>One Model, One File</strong>
|
| 84 |
<p>All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.</p>
|
| 85 |
+
<em>Every model should be understandable and hackable by reading a single file from top to bottom.</em>
|
| 86 |
</li>
|
| 87 |
<li class="tenet">
|
| 88 |
<a id="code-is-product"></a>
|
| 89 |
+
<strong>Code is the Product</strong>
|
| 90 |
+
<p>Optimize for reading, diff-ing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
|
| 91 |
<em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
|
| 92 |
</li>
|
| 93 |
<li class="tenet">
|
| 94 |
<a id="standardize-dont-abstract"></a>
|
| 95 |
<strong>Standardize, Don't Abstract</strong>
|
| 96 |
+
<p>If it's model behavior, keep it in the file; abstractions are only for generic infra.</p>
|
| 97 |
<em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
|
| 98 |
</li>
|
| 99 |
<li class="tenet">
|
|
|
|
| 106 |
<li class="tenet">
|
| 107 |
<a id="minimal-user-api"></a>
|
| 108 |
<strong>Minimal User API</strong>
|
| 109 |
+
<p>Config, model, pre-processing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
|
| 110 |
<em>Keep the public interface simple and predictable, users should know what to expect.</em>
|
| 111 |
</li>
|
| 112 |
<li class="tenet">
|
| 113 |
<a id="backwards-compatibility"></a>
|
| 114 |
<strong>Backwards Compatibility</strong>
|
| 115 |
<p>Evolve by additive standardization, never break public APIs.</p>
|
| 116 |
+
<p>Any artifact that was once on the hub and worked with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.
|
| 117 |
<em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
|
| 118 |
</li>
|
| 119 |
<li class="tenet">
|
|
|
|
| 144 |
|
| 145 |
We want all models to have self-contained modeling code.
|
| 146 |
|
| 147 |
+
Each core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
|
| 148 |
|
| 149 |
This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
|
| 150 |
|
| 151 |
+
We need to separate both principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackabilty](#one-model-one-file).
|
| 152 |
|
| 153 |
+
What's the solution to this?
|
| 154 |
|
| 155 |
<div class="crumbs">
|
| 156 |
Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
|
|
|
|
| 163 |
|
| 164 |
We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
|
| 165 |
|
| 166 |
+
It works as follows. In order to contribute a model, define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_.
|
| 167 |
|
| 168 |
<summary id="generated-modeling">Auto-generated modeling code</summary>
|
| 169 |
|
|
|
|
| 275 |
|
| 276 |
### <a id="layers-attentions-caches"></a> Layers, attentions and caches
|
| 277 |
|
| 278 |
+
Following the same logic, the _nature_ of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we define a mapping that can be then
|
| 279 |
|
| 280 |
|
| 281 |
```python
|
|
|
|
| 357 |
|
| 358 |
### Many models, but not enough yet, are alike
|
| 359 |
|
| 360 |
+
Next, I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together. I also used code embedding models to check out code similarities, and it yielded better results, for the needs of this blog post I will stick to Jaccard index.
|
| 361 |
|
| 362 |
It is interesting, for that, to look at _when_ we deployed this modular logic and what was its rippling effect on the library. You can check the [larger space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) to play around, but the gist is: adding modular allowed to connect more and more models to solid reference points. We have a lot of gaps to fill in still.
|
| 363 |
|
|
|
|
| 457 |
|
| 458 |
Having a framework means forcing users into it. It restrains flexibility and creativity, which are the fertile soil for new ideas to grow.
|
| 459 |
|
| 460 |
+
Among the most valuable contributions to `transformers` is of course the addition of new models. Recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
|
| 461 |
|
| 462 |
A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
|
| 463 |
|
|
|
|
| 530 |
|
| 531 |
### Transformers-serve and continuous batching
|
| 532 |
|
| 533 |
+
Having all these models readily available allows to use all of them with `transformers-serve`, and enable interfacing with them with an Open API-like pattern. As a reminder, the hub also opens access to various [inference providers](https://huggingface.co/docs/inference-providers/en/index) if you're interested in model deployment in general.
|
| 534 |
|
| 535 |
```bash
|
| 536 |
transformers serve
|