pcuenq HF Staff commited on
Commit
83ac984
·
1 Parent(s): 81d2057

Apply to mdx

Browse files
Files changed (1) hide show
  1. app/src/content/article.mdx +54 -42
app/src/content/article.mdx CHANGED
@@ -16,48 +16,52 @@ tableOfContentsAutoCollapse: true
16
 
17
  import HtmlEmbed from "../components/HtmlEmbed.astro";
18
 
19
- ## Introduction
20
 
21
  One million lines of `python` code. Through them, the `transformers` library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
22
 
23
- Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
24
 
25
  This scale presents a monumental engineering challenge.
26
 
27
  How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates? We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
28
  We continue to support all new models and expect to do so for the foreseeable future.
29
 
30
- This post dissects the design philosophy that makes this possible today. It's a continuation of our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently, and I recommend the read if it's not done yet, a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) was written, explaining in particular what makes the library faster today. Again, all of that development was only made possible thanks to these principles.
31
 
32
- We codify the "tenets" that guide our development, demonstrate how they are implemented in code, and show the measurable impact they have on the library's sustainability and growth.
33
 
34
- For any OSS maintainer, power user, or contributor, this is the map to understanding, using, and building upon `transformers`, but not only: any project of comparable size will require you to make deep choices, not only on design and choice of abstraction, but on the very mindset of the software you are building.
35
 
36
- [Tenets exemplified](#source-of-truth) will have their summary available on hover.
37
 
38
- [External links](https://huggingface.co/blog/welcome-openai-gpt-oss) to articles will help you solidify your knowledge.
39
 
40
- [Several interactive visualisations](#generated-modeling) are available as you go - scroll, zoom, drag away.
 
 
41
 
42
  <div class="crumbs">
43
- Throughout this post, you'll find breadcrumb boxes like this one. They summarize what you just learned, connect it to the tenets, and point to what's coming <strong>Next</strong>. Think of them as narrative signposts to help you keep track.
44
  </div>
45
 
 
 
46
  ## The core tenets of transformers
47
 
48
 
49
  We summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
50
 
51
- Note that the library _evolved_ towards these principles, and that they _emerged_ from decisions taken, and once emerged they were recognized as critical.
52
 
53
  <div class="tenet-list">
54
  <ol>
55
  <li class="tenet">
56
  <a id="source-of-truth"></a>
57
  <strong>Source of Truth</strong>
58
- <p>We aim to be a [source of truth for all model definitions](#https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
59
- <em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
60
- </li>
61
 
62
  <li class="tenet">
63
  <a id="one-model-one-file"></a>
@@ -67,27 +71,27 @@ Note that the library _evolved_ towards these principles, and that they _emerged
67
  </li>
68
  <li class="tenet">
69
  <a id="code-is-product"></a>
70
- <strong>Code is Product</strong>
71
- <p>Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
72
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
73
  </li>
74
  <li class="tenet">
75
  <a id="standardize-dont-abstract"></a>
76
  <strong>Standardize, Don't Abstract</strong>
77
- <p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
78
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
79
  </li>
80
  <li class="tenet">
81
  <a id="do-repeat-yourself"></a>
82
  <strong>DRY* (DO Repeat Yourself)</strong>
83
- <p>Copy when it helps users; keep successors in sync without centralizing behavior.</p>
84
- <p><strong>Amendment:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.</p>
85
  <em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
86
  </li>
87
  <li class="tenet">
88
  <a id="minimal-user-api"></a>
89
  <strong>Minimal User API</strong>
90
- <p>Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
91
  <em>Keep the public interface simple and predictable, users should know what to expect.</em>
92
  </li>
93
  <li class="tenet">
@@ -95,23 +99,27 @@ Note that the library _evolved_ towards these principles, and that they _emerged
95
  <strong>Backwards Compatibility</strong>
96
  <p>Evolve by additive standardization, never break public APIs.</p>
97
  <p>Any artifact that was once on the hub and loadable with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.</p>
98
- <em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
99
- </li>
100
  <li class="tenet">
101
  <a id="consistent-public-surface"></a>
102
  <strong>Consistent Public Surface</strong>
103
- <p>Same argument names, same outputs, hidden states and attentions exposed, enforced by tests. This is a goal we have as well as a tenet.</p>
104
  <em>All models should feel familiar - consistent interfaces reduce cognitive load.</em>
105
  </li>
106
  </ol>
107
  </div>
108
 
109
 
110
- When a PR is merged, it is because the contribution is worthwhile, and that the `transformers` team finds the design of the contribution to be aligned with what is above.
 
 
111
 
112
- Does all the code in the library follow strictly these tenets? No. The library is a gigantic house with connected nooks, corridors, crannies everywhere built by thousands of different workers. We _try_ to make it so all the code added is compliant, because if we fail and merge it, we cannot change it lest we break [backwards compatibility](#backwards-compatibility).
113
 
114
- For instance, one function essential to the implementation of [Rotary Positional Embeddings](https://huggingface.co/papers/2104.09864) is identical in 70 `modeling_<file>.py` across `src/transformers/models/.` Why keep it? Because we want all the model logic to be [contained in the modeling file](#one-model-one-file). In order to do that, we [do repeat ourselves](#do-repeat-yourself).
 
 
115
 
116
  ```python
117
  def rotate_half(x):
@@ -121,48 +129,52 @@ def rotate_half(x):
121
  return torch.cat((-x2, x1), dim=-1)
122
  ```
123
 
124
- You can use a simple regex to look at all methods of a given name across your codebase and look at their differences and similarities, that's what I did (+ a hash to avoid quadraticity).
125
-
126
- We want all models to have self-contained modeling code.
127
 
128
- Every core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
129
 
130
- This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
131
 
132
- We needed to separate both principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackability](#one-model-one-file).
133
 
134
- What was the solution to this?
135
 
136
  <div class="crumbs">
137
- Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
 
 
138
  </div>
139
 
140
 
141
  ## <a id="modular"></a> Modular transformers
142
 
143
- Transformers is an opinionated library. The previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and the [blog post](https://huggingface.co/blog/transformers-design-philosophy) were already pointing at the drawbacks mentioned just above, which have been iteratively addressed. [`modular` transformers were introduced](https://huggingface.co/docs/transformers/en/modular_transformers), allowing a form of inheritance without breaking [One model, One file](#one-model-one-file).
144
 
145
- We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
146
 
147
- It works as follows. In order to contribute a model, say for instance define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_.
148
- This modular file can use inheritance across models: and then, it will be unravelled into a fully functional modeling file.
149
 
150
  <summary id="generated-modeling">Auto-generated modeling code</summary>
151
 
152
  <HtmlEmbed src="transformers/glm-compare.html" />
153
 
154
- As you can see, we can now define any model as a _modular_ of another.
155
 
156
  You might think "well that's just how inheritance works". The crucial difference is that we do _visibly_ what is essentially the _compiler_'s job: by unrolling the inheritances, we make visible all of the modeling code, keeping it [all in one piece](#one-model-one-file).
157
 
158
- What is the consequence? When adding a model, we do not need to go over the entire modeling file. The modular (left side above) is enough.
159
 
160
- When `AutoModel.from_pretrained(...)` is called, it is indeed the modeling (right side) that is ran, and all the tests are run on the modeling code.
 
 
161
 
162
  What does that give us?
163
 
164
  <div class="crumbs">
165
- A small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">tenet kept</a>). Reviewers and contributors maintain the shard, not the repetition. <strong>Next:</strong> the measurable effect on effective LOC and maintenance cost.
 
 
166
  </div>
167
 
168
 
@@ -560,4 +572,4 @@ Being a good backend consumer requires a consistent public surface; modular shar
560
 
561
  The next major version of `transformers` is just around the corner (and will have another blog post to its name when it comes out.). When v5 is released, we aim to keep [backwards compatibility](#backwards-compatibility) as solid as possible. The changes we make now are in service of that goal.
562
 
563
- We will lean further into a modular toolbox, not a framework. You should not be forced to rewrite modeling code. It’s better when a model can inherit from `PreTrainedModel` and opt into Tensor Parallel, `from_pretrained`, sharding, `push_to_hub`, loss plumbing, and external stacks like PEFT/TRL/SGLang/vLLM.
 
16
 
17
  import HtmlEmbed from "../components/HtmlEmbed.astro";
18
 
19
+ ## Preface
20
 
21
  One million lines of `python` code. Through them, the `transformers` library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
22
 
23
+ Built on `PyTorch`, transformers is a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
24
 
25
  This scale presents a monumental engineering challenge.
26
 
27
  How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates? We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
28
  We continue to support all new models and expect to do so for the foreseeable future.
29
 
30
+ This post dissects the design philosophy that makes this possible. It's the result of a gradual evolution from our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently (and I do recommend the read), we wrote a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) with a special focus on what makes the library faster today. All of these developments were only made possible thanks to these principles.
31
 
32
+ This post formalizes and articulates the "tenets" that have been guiding our development, demonstrates how they are implemented in code, and shows the measurable impact they have on the library's sustainability and growth.
33
 
34
+ For any OSS maintainer, power user, or contributor, this is the map to understanding, using, and building upon `transformers`; but not only: any project of comparable size will require you to make deep choices, not only on design and choice of abstractions, but on the very mindset of the software you are building. These tenets may or may not be applicable to your project, but they provide a glimpse on how we work that could be helpful or inspirational.
35
 
36
+ Conventions used throught this post:
37
 
38
+ * [Tenets exemplified](#source-of-truth) will have their summary available on hover.
39
 
40
+ * [External links](https://huggingface.co/blog/welcome-openai-gpt-oss) to articles will help you solidify your knowledge.
41
+
42
+ * [Several interactive visualisations](#generated-modeling) are available as you go - scroll, zoom, drag away to explore.
43
 
44
  <div class="crumbs">
45
+ * Breadcrumb boxes summarize what you just learned, connect it to the tenets, and point to what's coming <strong>Next</strong>. Think of them as narrative signposts to help you keep track.
46
  </div>
47
 
48
+ We will get started by enumerating the tenets. Then we'll look at concrete examples that show how they shape our decision-making. These examples are necessarily detailed, and sometimes complex, because they illustrate the challenges to maintain and grow a large codebase that caters to multiple collectives, has millions of users, hundreds of contributors, and always strives for simplicity and consistency.
49
+
50
  ## The core tenets of transformers
51
 
52
 
53
  We summarize the foundations on which we've built everything, and write the "tenets" of the library. They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
54
 
55
+ These principles were not decided in a vacuum. The library _evolved_ towards them, and once they _emerged_, they were recognized as critical.
56
 
57
  <div class="tenet-list">
58
  <ol>
59
  <li class="tenet">
60
  <a id="source-of-truth"></a>
61
  <strong>Source of Truth</strong>
62
+ <p>We aim to be a [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is more of a goal than a tenet, but it strongly guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original implementations. If we are successful, they should become reference baselines for the ecosystem, so they'll be easily adopted by downstream libraries and projects. It's much easier for a project to _always_ refer to the transformers implementation, than to learn a different research codebase every time a new architecture is released.</p>
63
+ <em>This overarching guideline ensures quality and reproducibility across all models in the library, and aspires to make the community work easier.</em>
64
+ </li>
65
 
66
  <li class="tenet">
67
  <a id="one-model-one-file"></a>
 
71
  </li>
72
  <li class="tenet">
73
  <a id="code-is-product"></a>
74
+ <strong>Code is The Product</strong>
75
+ <p>Optimize for reading, diffing, and tweaking. Our users are power users. Variables are explicit, we use full words, and even several words. Readability is primordial.</p>
76
  <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
77
  </li>
78
  <li class="tenet">
79
  <a id="standardize-dont-abstract"></a>
80
  <strong>Standardize, Don't Abstract</strong>
81
+ <p>If it's model behavior, keep it in the file; only use abstractions for generic infra.</p>
82
  <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
83
  </li>
84
  <li class="tenet">
85
  <a id="do-repeat-yourself"></a>
86
  <strong>DRY* (DO Repeat Yourself)</strong>
87
+ <p>Copy code when it helps users; keep successors in sync without centralizing behavior.</p>
88
+ <p><strong>Evolution:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet as if code had been copied to make modeling files standalone.</p>
89
  <em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
90
  </li>
91
  <li class="tenet">
92
  <a id="minimal-user-api"></a>
93
  <strong>Minimal User API</strong>
94
+ <p>Config, model, preprocessing; `from_pretrained`, `save_pretrained`, `push_to_hub`. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
95
  <em>Keep the public interface simple and predictable, users should know what to expect.</em>
96
  </li>
97
  <li class="tenet">
 
99
  <strong>Backwards Compatibility</strong>
100
  <p>Evolve by additive standardization, never break public APIs.</p>
101
  <p>Any artifact that was once on the hub and loadable with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.</p>
102
+ <em>Once something is public, it stays public. Evolution through addition, not breaking changes.</em>
103
+ </li>
104
  <li class="tenet">
105
  <a id="consistent-public-surface"></a>
106
  <strong>Consistent Public Surface</strong>
107
+ <p>Same argument names, same outputs, hidden states and attentions exposed, enforced by tests. This is a goal as well as a tenet.</p>
108
  <em>All models should feel familiar - consistent interfaces reduce cognitive load.</em>
109
  </li>
110
  </ol>
111
  </div>
112
 
113
 
114
+ When a PR is merged, it is because the contribution is worthwhile, and because the `transformers` team finds the design of the contribution to be aligned with these principles.
115
+
116
+ Does all the code in the library strictly follow these tenets? No. The library is a gigantic house with connected nooks, corridors, crannies everywhere, built by thousands of different workers. We _try_ to make it so all the code added is compliant, because if we fail and merge it, we cannot change it lest we break [backwards compatibility](#backwards-compatibility).
117
 
118
+ <!-- I found the transition to the following example confusing. It implied (because of the previous paragraph and the `for instance` clause) that it's not following the tenets, where in fact it's something we WANT to do. Suggesting some slight reordering. -->
119
 
120
+ To see what constitutes adherence to the tenets, let's take the example of code repetition.
121
+
122
+ The following function, which is essential to the implementation of [Rotary Positional Embeddings](https://huggingface.co/papers/2104.09864), can be found in 70 `modeling_<file>.py` files across `src/transformers/models/.` Why keep it? Because we want all the model logic to be [contained in the modeling file](#one-model-one-file). In order to do that, we [do repeat ourselves](#do-repeat-yourself).
123
 
124
  ```python
125
  def rotate_half(x):
 
129
  return torch.cat((-x2, x1), dim=-1)
130
  ```
131
 
132
+ You can use a simple regex, like [this one]() to look at all methods of a given name across your codebase and look at their differences and similarities.
133
+ <!-- I'd maybe remove the previous line altogether and just use a link in the paragraph above -->
 
134
 
135
+ We want all models to have self-contained modeling code. Every core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
136
 
137
+ This comes at a great cost. For a long time we used the `#Copied from...` mechanism: we added comments that documented that some code was copied from another model, saving time both for the reviewers and for the CI: we had tooling to ensure that the copied blocks remained in sync. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
138
 
139
+ We needed to separate two principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackabilty](#one-model-one-file).
140
 
141
+ What was the solution to this? Let's talk about modular transformers.
142
 
143
  <div class="crumbs">
144
+ <strong>TL;DR:</strong> Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>).
145
+
146
+ <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
147
  </div>
148
 
149
 
150
  ## <a id="modular"></a> Modular transformers
151
 
152
+ Transformers is an opiniated library. The previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and the [2022 blog post](https://huggingface.co/blog/transformers-design-philosophy) were already pointing at the drawbacks mentioned just above, which have been iteratively addressed. [`modular` transformers was introduced](https://huggingface.co/docs/transformers/en/modular_transformers) to allow a form of inheritance without breaking [One model, One file](#one-model-one-file).
153
 
154
+ We amended the principle of [DRY*](#do-repeat-yourself) by progressively removing all pieces of code that were "copied from" another file.
155
 
156
+ It works as follows. In order to contribute a model –GLM, for instance we define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files already available in the library_. The modular file can use inheritance across models, but then it's unravelled into a fully functional and standalone modeling file.
 
157
 
158
  <summary id="generated-modeling">Auto-generated modeling code</summary>
159
 
160
  <HtmlEmbed src="transformers/glm-compare.html" />
161
 
162
+ As you can see, we can define a new model as a _modular_ combination of fragments taken from others.
163
 
164
  You might think "well that's just how inheritance works". The crucial difference is that we do _visibly_ what is essentially the _compiler_'s job: by unrolling the inheritances, we make visible all of the modeling code, keeping it [all in one piece](#one-model-one-file).
165
 
166
+ <!-- some ideas for additional hand-holding: link to the implementation of `LlamaAttention` to show it was copied (and modified), or maybe provide a git diff view between the GlmAttention and LlamaAttention implementations -->
167
 
168
+ What is the consequence? When adding a model, we do not need to go over the entire modeling file. The modular (left side above) is enough.
169
+
170
+ When `AutoModel.from_pretrained(...)` is called, it is indeed the modeling (right side) that is ran, and all the tests run on the modeling code. More importantly, the auto-generated modeling file is what users _read_ to understand the code, what they step through in their debuggers and what they hack for their needs.
171
 
172
  What does that give us?
173
 
174
  <div class="crumbs">
175
+ <strong>TL;DR:</strong> A small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">One Model, One File tenet preserved</a>). Reviewers and contributors maintain the shard, not the repetition.
176
+
177
+ <strong>Next:</strong> the measurable effect on effective LOC and maintenance cost.
178
  </div>
179
 
180
 
 
572
 
573
  The next major version of `transformers` is just around the corner (and will have another blog post to its name when it comes out.). When v5 is released, we aim to keep [backwards compatibility](#backwards-compatibility) as solid as possible. The changes we make now are in service of that goal.
574
 
575
+ We will lean further into a modular toolbox, not a framework. You should not be forced to rewrite modeling code. It’s better when a model can inherit from `PreTrainedModel` and opt into Tensor Parallel, `from_pretrained`, sharding, `push_to_hub`, loss plumbing, and external stacks like PEFT/TRL/SGLang/vLLM.