builded
Browse files- dist/index.html +13 -9
dist/index.html
CHANGED
|
@@ -8,22 +8,22 @@
|
|
| 8 |
<script src="https://d3js.org/d3.v7.min.js"></script>
|
| 9 |
<meta name="viewport" content="width=device-width, initial-scale=1">
|
| 10 |
<meta charset="utf8">
|
| 11 |
-
<title>
|
| 12 |
<link rel="stylesheet" href="style.css">
|
| 13 |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css">
|
| 14 |
</head>
|
| 15 |
<body>
|
| 16 |
<d-front-matter>
|
| 17 |
<script id='distill-front-matter' type="text/json">{
|
| 18 |
-
"title": "
|
| 19 |
-
"description": "
|
| 20 |
"published": "Aug 21, 2025",
|
| 21 |
"authors": [{"author": "Pablo Montalvo", "authorURL": "https://huggingface.co/Molbap"}]
|
| 22 |
}</script>
|
| 23 |
</d-front-matter>
|
| 24 |
<d-title>
|
| 25 |
-
<h1>
|
| 26 |
-
<p>
|
| 27 |
</d-title>
|
| 28 |
<d-byline></d-byline>
|
| 29 |
<d-article>
|
|
@@ -48,9 +48,13 @@
|
|
| 48 |
</nav>
|
| 49 |
</d-contents>
|
| 50 |
<h2>Introduction</h2>
|
| 51 |
-
<p>The <code>transformers</code> library, built with <code>PyTorch</code>, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models
|
| 52 |
-
<p>The
|
| 53 |
-
<p>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
<h3>What you will learn</h3>
|
| 55 |
<p>Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the <code>transformers</code> code base, how to use it better, how to meaningfully contribute to it.
|
| 56 |
This will also showcase new features you might have missed so you’ll be up-to-date.</p>
|
|
@@ -537,7 +541,7 @@ machinery is the <code>attention mask</code>, cause of confusion. Thankfully, we
|
|
| 537 |
|
| 538 |
// Extract tenet text for tooltips
|
| 539 |
const tenetTooltips = {
|
| 540 |
-
'source-of-truth': 'We
|
| 541 |
'one-model-one-file': 'All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.',
|
| 542 |
'code-is-product': 'Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.',
|
| 543 |
'standardize-dont-abstract': 'If it\'s model behavior, keep it in the file; abstractions only for generic infra.',
|
|
|
|
| 8 |
<script src="https://d3js.org/d3.v7.min.js"></script>
|
| 9 |
<meta name="viewport" content="width=device-width, initial-scale=1">
|
| 10 |
<meta charset="utf8">
|
| 11 |
+
<title>Scaling insanity: maintaining hundreds of model definitions</title>
|
| 12 |
<link rel="stylesheet" href="style.css">
|
| 13 |
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css">
|
| 14 |
</head>
|
| 15 |
<body>
|
| 16 |
<d-front-matter>
|
| 17 |
<script id='distill-front-matter' type="text/json">{
|
| 18 |
+
"title": "Scaling insanity: maintaining hundreds of model definitions",
|
| 19 |
+
"description": "A peek into software engineering for the transformers library",
|
| 20 |
"published": "Aug 21, 2025",
|
| 21 |
"authors": [{"author": "Pablo Montalvo", "authorURL": "https://huggingface.co/Molbap"}]
|
| 22 |
}</script>
|
| 23 |
</d-front-matter>
|
| 24 |
<d-title>
|
| 25 |
+
<h1>Scaling insanity: maintaining hundreds of model definitions</h1>
|
| 26 |
+
<p>A peek into software engineering for the transformers library</p>
|
| 27 |
</d-title>
|
| 28 |
<d-byline></d-byline>
|
| 29 |
<d-article>
|
|
|
|
| 48 |
</nav>
|
| 49 |
</d-contents>
|
| 50 |
<h2>Introduction</h2>
|
| 51 |
+
<p>The <code>transformers</code> library, built with <code>PyTorch</code>, supports all state-of-the-art LLMs, many VLMs, task-specific vision language models, video models, audio models, table models, classical encoders, to a global count of almost 400 models.</p>
|
| 52 |
+
<p>The name of the library itself is mostly majority driven as many models are not even transformers architectures, like Mamba, Zamba, RWKV, and convolution-based models.</p>
|
| 53 |
+
<p>Regardless, each of these is wrought by the research and engineering team that created them, then harmonized into a now famous interface, and callable with a simple <code>.from_pretrained</code> command.</p>
|
| 54 |
+
<p>Inference works for all models, training is functional for most. The library is a foundation for many machine learning courses, cookbooks, and overall, several thousands other open-source libraries depend on it. All models are tested as part of a daily CI ensuring their preservation and reproducibility. Most importantly, it is <em>open-source</em> and has been written by the community for a large part.</p>
|
| 55 |
+
<p>This isn’t really to brag but to set the stakes: what does it take to keep such a ship afloat, made of so many moving, unrelated parts?</p>
|
| 56 |
+
<p>The ML wave has not stopped, there’s more and more models being added, at a steadily growing rate. <code>Transformers</code> is widely used, and we read the feedback that users post online. Whether it’s about a function that had 300+ keyword arguments, duplicated code and helpers, and mentions of <code>Copied from ... </code> everywhere, along with optimisation concerns. Text-only models are relatively tamed, but multimodal models remain to be harmonized.</p>
|
| 57 |
+
<p>Here we will dissect what is the new design philosophy of transformers, as a continuation from the existing older <a href="https://huggingface.co/docs/transformers/en/philosophy">philosophy</a> page, and an accompanying <a href="https://huggingface.co/blog/transformers-design-philosophy">blog post from 2022</a> . Some time ago I dare not say how long, we discussed with transformers maintainers about the state of things. A lot of recent developments were satisfactory, but if we were only talking about these, self-congratulation would be the only goalpost. Reflecting on this philosophy now, as models pile up, is essential and will drive new developments.</p>
|
| 58 |
<h3>What you will learn</h3>
|
| 59 |
<p>Every reader, whether an OSS maintainer, power user, or casual fine-tuner, will walk away knowing how to reason about the <code>transformers</code> code base, how to use it better, how to meaningfully contribute to it.
|
| 60 |
This will also showcase new features you might have missed so you’ll be up-to-date.</p>
|
|
|
|
| 541 |
|
| 542 |
// Extract tenet text for tooltips
|
| 543 |
const tenetTooltips = {
|
| 544 |
+
'source-of-truth': 'We aim to be a source of truth for all model definitions. Model implementations should be reliable, reproducible, and faithful to the original performances.',
|
| 545 |
'one-model-one-file': 'All inference (and most of training, loss is separate, not a part of model) logic visible, top‑to‑bottom.',
|
| 546 |
'code-is-product': 'Optimize for reading, diffing, and tweaking, our users are power users. Variables can be explicit, full words, even several words, readability is primordial.',
|
| 547 |
'standardize-dont-abstract': 'If it\'s model behavior, keep it in the file; abstractions only for generic infra.',
|