Spaces:

transformers-community
/

Transformers-tenets

Running

App Files Files Community

pcuenq HF Staff commited on Oct 2

Commit

83ac984

1 Parent(s): 81d2057

Apply to mdx

Browse files

Files changed (1) hide show

app/src/content/article.mdx +54 -42

app/src/content/article.mdx CHANGED Viewed

@@ -16,48 +16,52 @@ tableOfContentsAutoCollapse: true
 import HtmlEmbed from "../components/HtmlEmbed.astro";
-## Introduction
 One million lines of `python` code. Through them, the `transformers` library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
-Built on `PyTorch`, it's a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
 This scale presents a monumental engineering challenge.
 How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates? We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
 We continue to support all new models and expect to do so for the foreseeable future.
-This post dissects the design philosophy that makes this possible today. It's a continuation of our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently, and I recommend the read if it's not done yet, a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) was written, explaining in particular what makes the library faster today. Again, all of that development was only made possible thanks to these principles.
-We codify the "tenets" that guide our development, demonstrate how they are implemented in code, and show the measurable impact they have on the library's sustainability and growth.
-For any OSS maintainer, power user, or contributor, this is the map to understanding, using, and building upon `transformers`, but not only: any project of comparable size will require you to make deep choices, not only on design and choice of abstraction, but on the very mindset of the software you are building.
-[Tenets exemplified](#source-of-truth) will have their summary available on hover.
-[External links](https://huggingface.co/blog/welcome-openai-gpt-oss) to articles will help you solidify your knowledge.
-[Several interactive visualisations](#generated-modeling) are available as you go - scroll, zoom, drag away.
 <div class="crumbs">
-Throughout this post, you'll find breadcrumb boxes like this one. They summarize what you just learned, connect it to the tenets, and point to what's coming <strong>Next</strong>. Think of them as narrative signposts to help you keep track.
 </div>
 ## The core tenets of transformers
 We summarize the foundations on which we've built everything, and write the "tenets" of the library.  They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
-Note that the library _evolved_ towards these principles, and that they _emerged_ from decisions taken, and once emerged they were recognized as critical.
 <div class="tenet-list">
 <ol>
 <li class="tenet">
 <a id="source-of-truth"></a>
 <strong>Source of Truth</strong>
-<p>We aim to be a [source of truth for all model definitions](#https://huggingface.co/blog/transformers-model-definition). This is not a tenet, but something that still guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original performances.</p>
-<em>This overarching guideline ensures quality and reproducibility across all models in the library.</em>
-</li>
 <li class="tenet">
 <a id="one-model-one-file"></a>
@@ -67,27 +71,27 @@ Note that the library _evolved_ towards these principles, and that they _emerged
 </li>
 <li class="tenet">
 <a id="code-is-product"></a>
-<strong>Code is Product</strong>
-<p>Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.</p>
 <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
 </li>
 <li class="tenet">
 <a id="standardize-dont-abstract"></a>
 <strong>Standardize, Don't Abstract</strong>
-<p>If it's model behavior, keep it in the file; abstractions only for generic infra.</p>
 <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
 </li>
 <li class="tenet">
 <a id="do-repeat-yourself"></a>
 <strong>DRY* (DO Repeat Yourself)</strong>
-<p>Copy when it helps users; keep successors in sync without centralizing behavior.</p>
-<p><strong>Amendment:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet.</p>
 <em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
 </li>
 <li class="tenet">
 <a id="minimal-user-api"></a>
 <strong>Minimal User API</strong>
-<p>Config, model, preprocessing; from_pretrained, save_pretrained, push_to_hub. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
 <em>Keep the public interface simple and predictable, users should know what to expect.</em>
 </li>
 <li class="tenet">
@@ -95,23 +99,27 @@ Note that the library _evolved_ towards these principles, and that they _emerged
 <strong>Backwards Compatibility</strong>
 <p>Evolve by additive standardization, never break public APIs.</p>
 <p>Any artifact that was once on the hub and loadable with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.</p>
-<em>Once something is public, it stays public, evolution through addition, not breaking changes.</em>
-</li>
 <li class="tenet">
 <a id="consistent-public-surface"></a>
 <strong>Consistent Public Surface</strong>
-<p>Same argument names, same outputs, hidden states and attentions exposed, enforced by tests. This is a goal we have as well as a tenet.</p>
 <em>All models should feel familiar - consistent interfaces reduce cognitive load.</em>
 </li>
 </ol>
 </div>
-When a PR is merged, it is because the contribution is worthwhile, and that the  `transformers` team finds the design of the contribution to be aligned with what is above.
-Does all the code in the library follow strictly these tenets? No. The library is a gigantic house with connected nooks, corridors, crannies everywhere built by thousands of different workers. We _try_ to make it so all the code added is compliant, because if we fail and merge it, we cannot change it lest we break [backwards compatibility](#backwards-compatibility).
-For instance, one function essential to the implementation of [Rotary Positional Embeddings](https://huggingface.co/papers/2104.09864) is identical in 70 `modeling_<file>.py` across `src/transformers/models/.`  Why keep it? Because we want all the model logic to be [contained in the modeling file](#one-model-one-file). In order to do that, we [do repeat ourselves](#do-repeat-yourself).
 ```python
 def rotate_half(x):
@@ -121,48 +129,52 @@ def rotate_half(x):
     return torch.cat((-x2, x1), dim=-1)
 ```
-You can use a simple regex to look at all methods of a given name across your codebase and look at their differences and similarities, that's what I did (+ a hash to avoid quadraticity).
-We want all models to have self-contained modeling code.
-Every core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
-This comes as a great cost. Enter the `#Copied from...` mechanism: for a long time, these comments were indicating that some code was copied from another model, saving time both for the reviewers and for the CI. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
-We needed to separate both principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackability](#one-model-one-file).
-What was the solution to this?
 <div class="crumbs">
-Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
 </div>
 ## <a id="modular"></a> Modular transformers
-Transformers is an opinionated library. The previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and the [blog post](https://huggingface.co/blog/transformers-design-philosophy) were already pointing at the drawbacks mentioned just above, which have been iteratively addressed. [`modular` transformers were introduced](https://huggingface.co/docs/transformers/en/modular_transformers), allowing a form of inheritance without breaking [One model, One file](#one-model-one-file).
-We amended the principle of [DRY*](#do-repeat-yourself) by removing progressively all pieces of code that were "copied from" another file.
-It works as follows. In order to contribute a model, say for instance  define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files_.
-This modular file can use inheritance across models: and then, it will be unravelled into a fully functional modeling file.
 <summary id="generated-modeling">Auto-generated modeling code</summary>
 <HtmlEmbed src="transformers/glm-compare.html" />
-As you can see, we can now define any model as a _modular_ of another.
 You might think "well that's just how inheritance works". The crucial difference is that we do _visibly_ what is essentially the _compiler_'s job: by unrolling the inheritances, we make visible all of the modeling code, keeping it [all in one piece](#one-model-one-file).
-What is the consequence? When adding a model, we do not need to go over the entire modeling file. The modular (left side above) is enough.
-When `AutoModel.from_pretrained(...)` is called, it is indeed the modeling (right side) that is ran, and all the tests are run on the modeling code.
 What does that give us?
 <div class="crumbs">
-A small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">tenet kept</a>). Reviewers and contributors maintain the shard, not the repetition. <strong>Next:</strong> the measurable effect on effective LOC and maintenance cost.
 </div>
@@ -560,4 +572,4 @@ Being a good backend consumer requires a consistent public surface; modular shar
 The next major version of `transformers` is just around the corner (and will have another blog post to its name when it comes out.). When v5 is released, we aim to keep [backwards compatibility](#backwards-compatibility) as solid as possible. The changes we make now are in service of that goal.
-We will lean further into a modular toolbox, not a framework. You should not be forced to rewrite modeling code. It’s better when a model can inherit from `PreTrainedModel` and opt into Tensor Parallel, `from_pretrained`, sharding, `push_to_hub`, loss plumbing, and external stacks like PEFT/TRL/SGLang/vLLM.

 import HtmlEmbed from "../components/HtmlEmbed.astro";
+## Preface
 One million lines of `python` code. Through them, the `transformers` library supports more than 400 model architectures, from state-of-the-art LLMs and VLMs to specialized models for audio, video, and tables.
+Built on `PyTorch`, transformers is a foundational tool for modern LLM usage, research, education, and tens of thousands of other open-source projects. Each AI model is added by the community, harmonized into a consistent interface, and tested daily on a CI to ensure reproducibility.
 This scale presents a monumental engineering challenge.
 How do you keep such a ship afloat, made of so many moving, unrelated parts, contributed to by a buzzing hivemind? Especially as the pace of ML research accelerates? We receive constant feedback on everything from function signatures with hundreds of arguments to duplicated code and optimization concerns, and we listen to all of it, or try to. The library's usage keeps on growing, and we are a small team of maintainers and contributors, backed by hundreds of open-source community members.
 We continue to support all new models and expect to do so for the foreseeable future.
+This post dissects the design philosophy that makes this possible. It's the result of a gradual evolution from our older principles, detailed on our previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, as well as its accompanying [blog post from 2022](https://huggingface.co/blog/transformers-design-philosophy). More recently (and I do recommend the read), we wrote a blog post about [recent upgrades to transformers](https://huggingface.co/blog/faster-transformers) with a special focus on what makes the library faster today. All of these developments were only made possible thanks to these principles.
+This post formalizes and articulates the "tenets" that have been guiding our development, demonstrates how they are implemented in code, and shows the measurable impact they have on the library's sustainability and growth.
+For any OSS maintainer, power user, or contributor, this is the map to understanding, using, and building upon `transformers`; but not only: any project of comparable size will require you to make deep choices, not only on design and choice of abstractions, but on the very mindset of the software you are building. These tenets may or may not be applicable to your project, but they provide a glimpse on how we work that could be helpful or inspirational.
+Conventions used throught this post:
+* [Tenets exemplified](#source-of-truth) will have their summary available on hover.
+* [External links](https://huggingface.co/blog/welcome-openai-gpt-oss) to articles will help you solidify your knowledge.
+* [Several interactive visualisations](#generated-modeling) are available as you go - scroll, zoom, drag away to explore.
 <div class="crumbs">
+* Breadcrumb boxes summarize what you just learned, connect it to the tenets, and point to what's coming <strong>Next</strong>. Think of them as narrative signposts to help you keep track.
 </div>
+We will get started by enumerating the tenets. Then we'll look at concrete examples that show how they shape our decision-making. These examples are necessarily detailed, and sometimes complex, because they illustrate the challenges to maintain and grow a large codebase that caters to multiple collectives, has millions of users, hundreds of contributors, and always strives for simplicity and consistency.
 ## The core tenets of transformers
 We summarize the foundations on which we've built everything, and write the "tenets" of the library.  They behave like _software interfaces_, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.
+These principles were not decided in a vacuum. The library _evolved_ towards them, and once they _emerged_, they were recognized as critical.
 <div class="tenet-list">
 <ol>
 <li class="tenet">
 <a id="source-of-truth"></a>
 <strong>Source of Truth</strong>
+<p>We aim to be a [source of truth for all model definitions](https://huggingface.co/blog/transformers-model-definition). This is more of a goal than a tenet, but it strongly guides our decisions. Model implementations should be reliable, reproducible, and faithful to the original implementations. If we are successful, they should become reference baselines for the ecosystem, so they'll be easily adopted by downstream libraries and projects. It's much easier for a project to _always_ refer to the transformers implementation, than to learn a different research codebase every time a new architecture is released.</p>
+<em>This overarching guideline ensures quality and reproducibility across all models in the library, and aspires to make the community work easier.</em>
+</li>
 <li class="tenet">
 <a id="one-model-one-file"></a>
 </li>
 <li class="tenet">
 <a id="code-is-product"></a>
+<strong>Code is The Product</strong>
+<p>Optimize for reading, diffing, and tweaking. Our users are power users. Variables are explicit, we use full words, and even several words. Readability is primordial.</p>
 <em>Code quality matters as much as functionality - optimize for human readers, not just computers.</em>
 </li>
 <li class="tenet">
 <a id="standardize-dont-abstract"></a>
 <strong>Standardize, Don't Abstract</strong>
+<p>If it's model behavior, keep it in the file; only use abstractions for generic infra.</p>
 <em>Model-specific logic belongs in the model file, not hidden behind abstractions.</em>
 </li>
 <li class="tenet">
 <a id="do-repeat-yourself"></a>
 <strong>DRY* (DO Repeat Yourself)</strong>
+<p>Copy code when it helps users; keep successors in sync without centralizing behavior.</p>
+<p><strong>Evolution:</strong> With the introduction and global adoption of <a href="#modular">modular</a> transformers, we do not repeat any logic in the modular files, but end user files remain faithful to the original tenet as if code had been copied to make modeling files standalone.</p>
 <em>Strategic duplication can improve readability and maintainability when done thoughtfully.</em>
 </li>
 <li class="tenet">
 <a id="minimal-user-api"></a>
 <strong>Minimal User API</strong>
+<p>Config, model, preprocessing; `from_pretrained`, `save_pretrained`, `push_to_hub`. We want the least amount of codepaths. Reading should be obvious, configurations should be obvious.</p>
 <em>Keep the public interface simple and predictable, users should know what to expect.</em>
 </li>
 <li class="tenet">
 <strong>Backwards Compatibility</strong>
 <p>Evolve by additive standardization, never break public APIs.</p>
 <p>Any artifact that was once on the hub and loadable with transformers should be usable indefinitely with the same interface. Further, public methods should not change to avoid breaking dependencies.</p>
+<em>Once something is public, it stays public. Evolution through addition, not breaking changes.</em>
+</li>
 <li class="tenet">
 <a id="consistent-public-surface"></a>
 <strong>Consistent Public Surface</strong>
+<p>Same argument names, same outputs, hidden states and attentions exposed, enforced by tests. This is a goal as well as a tenet.</p>
 <em>All models should feel familiar - consistent interfaces reduce cognitive load.</em>
 </li>
 </ol>
 </div>
+When a PR is merged, it is because the contribution is worthwhile, and because the  `transformers` team finds the design of the contribution to be aligned with these principles.
+Does all the code in the library strictly follow these tenets? No. The library is a gigantic house with connected nooks, corridors, crannies everywhere, built by thousands of different workers. We _try_ to make it so all the code added is compliant, because if we fail and merge it, we cannot change it lest we break [backwards compatibility](#backwards-compatibility).
+<!-- I found the transition to the following example confusing. It implied (because of the previous paragraph and the `for instance` clause) that it's not following the tenets, where in fact it's something we WANT to do. Suggesting some slight reordering. -->
+To see what constitutes adherence to the tenets, let's take the example of code repetition.
+The following function, which is essential to the implementation of [Rotary Positional Embeddings](https://huggingface.co/papers/2104.09864), can be found in 70 `modeling_<file>.py` files across `src/transformers/models/.` Why keep it? Because we want all the model logic to be [contained in the modeling file](#one-model-one-file). In order to do that, we [do repeat ourselves](#do-repeat-yourself).
 ```python
 def rotate_half(x):
     return torch.cat((-x2, x1), dim=-1)
 ```
+You can use a simple regex, like [this one]() to look at all methods of a given name across your codebase and look at their differences and similarities.
+<!-- I'd maybe remove the previous line altogether and just use a link in the paragraph above -->
+We want all models to have self-contained modeling code. Every core functionality _must_ be in the modeling code, every non-core functionality _can_ be outside of it.
+This comes at a great cost. For a long time we used the `#Copied from...` mechanism: we added comments that documented that some code was copied from another model, saving time both for the reviewers and for the CI: we had tooling to ensure that the copied blocks remained in sync. But the LOC count kept creeping up. Each new model copied over hundreds of lines that we considered largely boilerplate, yet, we could not remove them.
+We needed to separate two principles that were so far intertwined, [repetition](#do-repeat-yourself) and [hackabilty](#one-model-one-file).
+What was the solution to this? Let's talk about modular transformers.
 <div class="crumbs">
+<strong>TL;DR:</strong> Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>).
+<strong>Next:</strong> how modular transformers honor these while removing boilerplate.
 </div>
 ## <a id="modular"></a> Modular transformers
+Transformers is an opiniated library. The previous [philosophy](https://huggingface.co/docs/transformers/en/philosophy) page, and the [2022 blog post](https://huggingface.co/blog/transformers-design-philosophy) were already pointing at the drawbacks mentioned just above, which have been iteratively addressed. [`modular` transformers was introduced](https://huggingface.co/docs/transformers/en/modular_transformers) to allow a form of inheritance without breaking [One model, One file](#one-model-one-file).
+We amended the principle of [DRY*](#do-repeat-yourself) by progressively removing all pieces of code that were "copied from" another file.
+It works as follows. In order to contribute a model –GLM, for instance–  we define a `modular_` file that can inherit from _any function across all other modeling, configuration and processor files already available in the library_. The modular file can use inheritance across models, but then it's unravelled into a fully functional and standalone modeling file.
 <summary id="generated-modeling">Auto-generated modeling code</summary>
 <HtmlEmbed src="transformers/glm-compare.html" />
+As you can see, we can define a new model as a _modular_ combination of fragments taken from others.
 You might think "well that's just how inheritance works". The crucial difference is that we do _visibly_ what is essentially the _compiler_'s job: by unrolling the inheritances, we make visible all of the modeling code, keeping it [all in one piece](#one-model-one-file).
+<!-- some ideas for additional hand-holding: link to the implementation of `LlamaAttention` to show it was copied (and modified), or maybe provide a git diff view between the GlmAttention and LlamaAttention implementations -->
+What is the consequence? When adding a model, we do not need to go over the entire modeling file. The modular (left side above) is enough.
+When `AutoModel.from_pretrained(...)` is called, it is indeed the modeling (right side) that is ran, and all the tests run on the modeling code. More importantly, the auto-generated modeling file is what users _read_ to understand the code, what they step through in their debuggers and what they hack for their needs.
 What does that give us?
 <div class="crumbs">
+<strong>TL;DR:</strong> A small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">One Model, One File tenet preserved</a>). Reviewers and contributors maintain the shard, not the repetition.
+<strong>Next:</strong> the measurable effect on effective LOC and maintenance cost.
 </div>
 The next major version of `transformers` is just around the corner (and will have another blog post to its name when it comes out.). When v5 is released, we aim to keep [backwards compatibility](#backwards-compatibility) as solid as possible. The changes we make now are in service of that goal.
+We will lean further into a modular toolbox, not a framework. You should not be forced to rewrite modeling code. It’s better when a model can inherit from `PreTrainedModel` and opt into Tensor Parallel, `from_pretrained`, sharding, `push_to_hub`, loss plumbing, and external stacks like PEFT/TRL/SGLang/vLLM.