reach-vb HF Staff commited on
Commit
294e2bb
·
verified ·
1 Parent(s): 9d48327
Files changed (1) hide show
  1. content/article.md +19 -17
content/article.md CHANGED
@@ -36,13 +36,15 @@
36
 
37
  ## Introduction
38
 
39
- One million lines of `python` code. Through them, the `transformers` library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
40
 
41
  Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
42
 
43
  This scale presents a monumental engineering challenge.
44
 
45
- How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates? We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members. We continue supporting all models that come out and will continue to do so in the foreseeable future.
 
 
46
 
47
  This post dissects the design philosophy that makes this possible. It's a continuation of our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently, and I recommend the read if it's not done yet, a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) was written, explaining in particular what makes the library faster today. Again, all of that development was only made possible thanks to these principles.
48
 
@@ -72,7 +74,7 @@ Note that the library _evolved_ towards these principles, and that they _emerged
72
  <li class="tenet">
73
  <a id="source-of-truth"></a>
74
  <strong>Source of Truth</strong>
75
- <p>We aim be a [source of truth for all model definitions](#https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
76
  <em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
77
  </li>
78
 
@@ -80,18 +82,18 @@ Note that the library _evolved_ towards these principles, and that they _emerged
80
  <a id="one-model-one-file"></a>
81
  <strong>One Model, One File</strong>
82
  <p>All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.</p>
83
- <em>Every model should be completely understandable and hackable by reading a single file from top to bottom.</em>
84
  </li>
85
  <li class="tenet">
86
  <a id="code-is-product"></a>
87
- <strong>Code is Product</strong>
88
- <p>Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
89
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
90
  </li>
91
  <li class="tenet">
92
  <a id="standardize-dont-abstract"></a>
93
  <strong>Standardize, Don't Abstract</strong>
94
- <p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
95
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
96
  </li>
97
  <li class="tenet">
@@ -104,14 +106,14 @@ Note that the library _evolved_ towards these principles, and that they _emerged
104
  <li class="tenet">
105
  <a id="minimal-user-api"></a>
106
  <strong>Minimal User API</strong>
107
- <p>Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
108
  <em>Keep the public interface simple and predictable, users should know what to expect.</em>
109
  </li>
110
  <li class="tenet">
111
  <a id="backwards-compatibility"></a>
112
  <strong>Backwards Compatibility</strong>
113
  <p>Evolve by additive standardization, never break public APIs.</p>
114
- <p>Any artifact that was once on the hub and loadable with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.
115
  <em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
116
  </li>
117
  <li class="tenet">
@@ -142,13 +144,13 @@ You can use a simple regex to look at all methods of a given name across your co
142
 
143
  We want all models to have self-contained modeling code.
144
 
145
- Every core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
146
 
147
  This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
148
 
149
- We needed to separate both principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackabilty](#one-model-one-file).
150
 
151
- What was the solution to this?
152
 
153
  <div class="crumbs">
154
  Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
@@ -161,7 +163,7 @@ Transformers is an opiniated library. The previous [philosophy](https://huggingf
161
 
162
  We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
163
 
164
- It works as follows. In order to contribute a model, say for instance define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_.
165
 
166
  <summary id="generated-modeling">Auto-generated modeling code</summary>
167
 
@@ -273,7 +275,7 @@ Sharding is configuration (<code>tp_plan</code>), not edits to <code>Linear</cod
273
 
274
  ### <a id="layers-attentions-caches"></a> Layers, attentions and caches
275
 
276
- Following the same logic, the _nature_ of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we defined a mapping that can be then
277
 
278
 
279
  ```python
@@ -355,7 +357,7 @@ Graph reading guide: nodes are models; edges are modular imports. Llama-lineage
355
 
356
  ### Many models, but not enough yet, are alike
357
 
358
- So I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together. I also used code embedding models to check out code similarities, and it yielded better results, for the needs of this blog post I will stick to Jaccard index.
359
 
360
  It is interesting, for that, to look at _when_ we deployed this modular logic and what was its rippling effect on the library. You can check the [larger space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) to play around, but the gist is: adding modular allowed to connect more and more models to solid reference points. We have a lot of gaps to fill in still.
361
 
@@ -455,7 +457,7 @@ This is an overall objective: there's no `transformers` without its community.
455
 
456
  Having a framework means forcing users into it. It restrains flexibility and creativity, which are the fertile soil for new ideas to grow.
457
 
458
- Among the most valuable contributions to `transformers` is of course the addition of new models. Very recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
459
 
460
  A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
461
 
@@ -528,7 +530,7 @@ Pre-allocating GPU memory removes malloc spikes (e.g., 7× for 8B, 6× for 32B i
528
 
529
  ### Transformers-serve and continuous batching
530
 
531
- Having all these models readily available allows to use all of them with transformers-serve, and enable interfacing with them with an Open API-like pattern. As a reminder, the hub also opens access to various [inference providers](https://huggingface.co/docs/inference-providers/en/index) if you're interested in model deployment in general.
532
 
533
  ```bash
534
  transformers serve
 
36
 
37
  ## Introduction
38
 
39
+ One million lines of `Python` code. Through them, the [`transformers`](https://github.com/huggingface/transformers) library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
40
 
41
  Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
42
 
43
  This scale presents a monumental engineering challenge.
44
 
45
+ How do you keep such a ship afloat, made of many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates?
46
+
47
+ We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members. We continue supporting all models that come out and will continue to do so in the foreseeable future.
48
 
49
  This post dissects the design philosophy that makes this possible. It's a continuation of our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently, and I recommend the read if it's not done yet, a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) was written, explaining in particular what makes the library faster today. Again, all of that development was only made possible thanks to these principles.
50
 
 
74
  <li class="tenet">
75
  <a id="source-of-truth"></a>
76
  <strong>Source of Truth</strong>
77
+ <p>We aim be the [source of truth for all model definitions](#https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
78
  <em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
79
  </li>
80
 
 
82
  <a id="one-model-one-file"></a>
83
  <strong>One Model, One File</strong>
84
  <p>All inference and training core logic has to be visible, top‑to‑bottom, to maximize each model's hackability.</p>
85
+ <em>Every model should be understandable and hackable by reading a single file from top to bottom.</em>
86
  </li>
87
  <li class="tenet">
88
  <a id="code-is-product"></a>
89
+ <strong>Code is the Product</strong>
90
+ <p>Optimize for reading, diff-ing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
91
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
92
  </li>
93
  <li class="tenet">
94
  <a id="standardize-dont-abstract"></a>
95
  <strong>Standardize, Don't Abstract</strong>
96
+ <p>If it's model behavior, keep it in the file; abstractions are only for generic infra.</p>
97
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
98
  </li>
99
  <li class="tenet">
 
106
  <li class="tenet">
107
  <a id="minimal-user-api"></a>
108
  <strong>Minimal User API</strong>
109
+ <p>Config, model, pre-processing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
110
  <em>Keep the public interface simple and predictable, users should know what to expect.</em>
111
  </li>
112
  <li class="tenet">
113
  <a id="backwards-compatibility"></a>
114
  <strong>Backwards Compatibility</strong>
115
  <p>Evolve by additive standardization, never break public APIs.</p>
116
+ <p>Any artifact that was once on the hub and worked with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.
117
  <em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
118
  </li>
119
  <li class="tenet">
 
144
 
145
  We want all models to have self-contained modeling code.
146
 
147
+ Each core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
148
 
149
  This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
150
 
151
+ We need to separate both principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackabilty](#one-model-one-file).
152
 
153
+ What's the solution to this?
154
 
155
  <div class="crumbs">
156
  Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
 
163
 
164
  We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
165
 
166
+ It works as follows. In order to contribute a model, define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_.
167
 
168
  <summary id="generated-modeling">Auto-generated modeling code</summary>
169
 
 
275
 
276
  ### <a id="layers-attentions-caches"></a> Layers, attentions and caches
277
 
278
+ Following the same logic, the _nature_ of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we define a mapping that can be then
279
 
280
 
281
  ```python
 
357
 
358
  ### Many models, but not enough yet, are alike
359
 
360
+ Next, I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together. I also used code embedding models to check out code similarities, and it yielded better results, for the needs of this blog post I will stick to Jaccard index.
361
 
362
  It is interesting, for that, to look at _when_ we deployed this modular logic and what was its rippling effect on the library. You can check the [larger space](https://huggingface.co/spaces/Molbap/transformers-modular-refactor) to play around, but the gist is: adding modular allowed to connect more and more models to solid reference points. We have a lot of gaps to fill in still.
363
 
 
457
 
458
  Having a framework means forcing users into it. It restrains flexibility and creativity, which are the fertile soil for new ideas to grow.
459
 
460
+ Among the most valuable contributions to `transformers` is of course the addition of new models. Recently, [OpenAI added GPT-OSS](https://huggingface.co/blog/welcome-openai-gpt-oss), which prompted the addition of many new features to the library in order to support [their model](https://huggingface.co/openai/gpt-oss-120b).
461
 
462
  A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for [gpt-oss 120b](https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b), despite its size!
463
 
 
530
 
531
  ### Transformers-serve and continuous batching
532
 
533
+ Having all these models readily available allows to use all of them with `transformers-serve`, and enable interfacing with them with an Open API-like pattern. As a reminder, the hub also opens access to various [inference providers](https://huggingface.co/docs/inference-providers/en/index) if you're interested in model deployment in general.
534
 
535
  ```bash
536
  transformers serve