Molbap HF Staff commited on
Commit
2bc6c5c
Β·
1 Parent(s): 34245c7
content/article.md CHANGED
@@ -52,10 +52,14 @@ For any OSS maintainer, power user, or contributor, this is the map to understan
52
 
53
  [Tenets exemplified](#source-of-truth) will have their summary available on hover.
54
 
55
- [External links](https://huggingface.co/blog/welcome-openai-gpt-oss) to articles will help you solidify your knowledge.
56
 
57
  [Several interactive visualisations](#generated-modeling) are available as you go - scroll, zoom, drag away.
58
 
 
 
 
 
59
  ## The core tenets of transformers
60
 
61
 
@@ -147,7 +151,7 @@ We needed to separate both principles that were so far intertwined, [repetition]
147
  What was the solution to this?
148
 
149
  <div class="crumbs">
150
- <strong>Breadcrumb</strong> β€” Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don’t Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). Next: how modular transformers honor these while removing boilerplate.
151
  </div>
152
 
153
 
@@ -174,7 +178,7 @@ When `AutoModel.from_pretrained(...)` is called, it is indeed the modeling (righ
174
  What does that gives us?
175
 
176
  <div class="crumbs">
177
- <strong>Breadcrumb</strong> β€” What changed: a small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">tenet kept</a>). Why it matters: reviewers and contributors maintain the shard, not the repetition. Next: the measurable effect on effective LOC and maintenance cost.
178
  </div>
179
 
180
 
@@ -203,7 +207,7 @@ The _attention computation_ itself happens at a _lower_ level of abstraction tha
203
  However, we were adding specific torch operations for each backend (sdpa, flash-attention iterations, flex attention) but it wasn't a [minimal user api](#minimal-user-api).
204
 
205
  <div class="crumbs">
206
- <strong>Breadcrumb</strong> β€” Evidence: effective LOC drops ~15Γ— when counting shards instead of expanded modeling. Less to read, fewer places to break. Related cleanups: attention backends moved behind a function interface. Next: how the attention interface stays standard without hiding semantics.
207
  </div>
208
 
209
  ### <a id="attention-classes"></a> External Attention classes
@@ -234,7 +238,7 @@ MyModelOutputAnnotated = Annotated[MyModelOutput, "shape: (B, C, H, W)"]
234
 
235
 
236
  <div class="crumbs">
237
- <strong>Breadcrumb</strong> β€” Semantics remain in <code>eager_attention_forward</code>; faster backends are opt-in via config. We inform via types/annotations rather than enforce rigid kwargs, preserving integrations. Next: distribution concerns are declared as a plan, not model surgery.
238
  </div>
239
 
240
  ### <a id="simpler-tensor-parallelism"></a> Configurable Tensor Parallelism
@@ -264,7 +268,7 @@ Which allows a user to run with multiple processes per node, e.g. 4 GPUs:
264
  Semantics stay in the model (a Linear stays a Linear), distribution is orthogonal and declared via strings: "colwise" splits columns of weights/bias across ranks; "rowwise" splits rows; packed variants shard fused weights; The mapping keys accept glob patterns like `layers.*.mlp.down_proj` to target repeated submodules.
265
 
266
  <div class="crumbs">
267
- <strong>Breadcrumb</strong> β€” Sharding is configuration (<code>tp_plan</code>), not edits to <code>Linear</code>s. Glob patterns target repeated blocks; modeling semantics stay intact. Next: per-layer attention/caching schedules declared in config, not hardcoded.
268
  </div>
269
 
270
  ### <a id="layers-attentions-caches"></a> Layers, attentions and caches
@@ -297,7 +301,7 @@ and the configuration can be _explicit_ about which attention type is in which l
297
  This is [minimal](#minimal-user-api) to implement on the user side, and allows to keep the modeling untouched. It is also easy to tweak.
298
 
299
  <div class="crumbs">
300
- <strong>Breadcrumb</strong> β€” Allowed layer types are explicit; schedules (e.g., sliding/full alternation) live in config. This keeps the file readable and easy to tweak. Next: speedups come from kernels that don’t change semantics.
301
  </div>
302
 
303
 
@@ -317,7 +321,7 @@ Even more resources have been added, like the formidable [kernel builder](https:
317
 
318
 
319
  <div class="crumbs">
320
- <strong>Breadcrumb</strong> β€” Models define semantics; kernels define how to run them faster. Use annotations to borrow community forwards while keeping a consistent public surface. Next: what modularity looks like across the repo.
321
  </div>
322
 
323
  ## Modular developments
@@ -345,7 +349,7 @@ Another problem is, this is only for `modular` models. Several models do NOT hav
345
  How do we spot them, and how do we identify modularisable models?
346
 
347
  <div class="crumbs">
348
- <strong>Breadcrumb</strong> β€” Graph reading guide: nodes are models; edges are modular imports. Llama-lineage is a hub; several VLMs remain islands β€” engineering opportunity for shared parents. Next: timeline + similarity signals to spot candidates.
349
  </div>
350
 
351
 
@@ -360,7 +364,7 @@ It is interesting, for that, to look at _when_ we deployed this modular logic an
360
  If you've checked out llava, you've seen that llava_video is a red node, connected by a red edge to llava: it's a candidate, something that we can _likely_ remodularize, [not touching the actual model](#backwards-compatibility) but being much more readable with [DRY*](#do-repeat-yourself).
361
 
362
  <div class="crumbs">
363
- <strong>Breadcrumb</strong> β€” Similarity (Jaccard; embeddings tried separately) surfaces likely parents; the timeline shows consolidation after modular landed. Red nodes/edges = candidates (e.g., <code>llava_video</code> β†’ <code>llava</code>) for refactors that preserve behavior. Next: concrete VLM choices that avoid leaky abstractions.
364
  </div>
365
 
366
  ### VLM improvements, avoiding abstraction
@@ -427,7 +431,7 @@ The following [Pull request to standardize placeholder masking](https://github.c
427
  But this is _within_ the modeling file, not in the `PreTrainedModel` base class. It will not move away from it, because it'd break the [self-contained logic](#one-model-one-file) of the model.
428
 
429
  <div class="crumbs">
430
- <strong>Breadcrumb</strong> β€” Keep VLM embedding mix in the modeling file (semantics), standardize safe helpers (e.g., placeholder masking), don’t migrate behavior to <code>PreTrainedModel</code>. Next: pipeline-level wins that came from PyTorch-first choices (fast processors).
431
  </div>
432
 
433
 
@@ -441,7 +445,7 @@ The gains in performance are immense, up to 20x speed for most models when compi
441
  <p class="figure-legend">Thanks <a href="https://huggingface.co/yonigozlan">Yoni Gozlan</a> for the great work!</p>
442
 
443
  <div class="crumbs">
444
- <strong>Breadcrumb</strong> β€” Torch-first lets processors assume torch/torchvision and run the whole pipeline on GPU; big per-model speedups. Next: how this lowers friction for contributors and downstream users.
445
  </div>
446
 
447
 
@@ -457,7 +461,7 @@ A second one is the ability to fine-tune and pipeline these models into many oth
457
 
458
 
459
  <div class="crumbs">
460
- <strong>Breadcrumb</strong> β€” The shape of a contribution: add a model (or variant) with a small modular shard; the community and serving stacks pick it up immediately. Popularity trends (encoders/embeddings) guide where we invest. Next: power tools enabled by a consistent API.
461
  </div>
462
 
463
 
@@ -475,7 +479,7 @@ In that regard, we DO want to be a modular toolbox, being [minimal](#minimal-use
475
  So, how do these design choices, these "tenets" influence development of models and overall usage of transformers?
476
 
477
  <div class="crumbs">
478
- <strong>Breadcrumb</strong> β€” Encoders remain critical for embeddings and retrieval; maintaining them well benefits the broader ecosystem (e.g., Sentence Transformers, FAISS). Next: dev tools that leverage unified attention APIs and PyTorch-only internals.
479
  </div>
480
 
481
 
@@ -490,7 +494,7 @@ One particular piece of machinery is the `attention mask`. Here you see the famo
490
  {{{fragment-attention-visualizer}}}
491
 
492
  <div class="crumbs">
493
- <strong>Breadcrumb</strong> β€” Uniform attention APIs enable cross-model diagnostics (e.g., PaliGemma prefix bidirectionality vs causal). Next: whole-model tracing for ports and regressions.
494
  </div>
495
 
496
 
@@ -504,7 +508,7 @@ It just works with PyTorch models and is especially useful when aligning outputs
504
 
505
 
506
  <div class="crumbs">
507
- <strong>Breadcrumb</strong> β€” Forward interception and nested JSON logging align ports to reference implementations, reinforcing β€œSource of Truth.” Next: CUDA warmup reduces load-time stalls without touching modeling semantics.
508
  </div>
509
 
510
 
@@ -518,7 +522,7 @@ Having a clean _external_ API allows us to work on the [true inner workings of t
518
  It's hard to overstate how much of a lifesaver that is when you're trying to load a model as fast as possible, as it's the narrowest bottleneck for your iteration speed.
519
 
520
  <div class="crumbs">
521
- <strong>Breadcrumb</strong> β€” Pre-allocating GPU memory removes malloc spikes (e.g., 7Γ— for 8B, 6Γ— for 32B in the referenced PR). Next: serving benefits directly from consistent interfaces and modularity.
522
  </div>
523
 
524
 
@@ -539,7 +543,7 @@ This provides an OpenAI-compatible API with features like [continuous batching](
539
  Continuous batching is in itself very much linked to the great work of vLLM with the `paged attention kernel`, further justifying the facilitation of [external kernels](#community-kernels).
540
 
541
  <div class="crumbs">
542
- <strong>Breadcrumb</strong> β€” OpenAI-compatible surface + continuous batching; kernels/backends slot in because the modeling API stayed stable. Next: reuse across vLLM/SGLang relies on the same consistency.
543
  </div>
544
 
545
 
@@ -556,7 +560,7 @@ This cements the need even more for a [consistent public surface](#consistent-pu
556
 
557
 
558
  <div class="crumbs">
559
- <strong>Breadcrumb</strong> β€” Being a good backend consumer requires a consistent public surface; modular shards and configs make that stability practical. Next: what changes in v5 without breaking the promise of visible semantics.
560
  </div>
561
 
562
  ## What is coming next
 
52
 
53
  [Tenets exemplified](#source-of-truth) will have their summary available on hover.
54
 
55
+ [External links](https://huggingface.co/blog/welcome-openai-gpt-oss) to articles will help you solidify your knowledge.
56
 
57
  [Several interactive visualisations](#generated-modeling) are available as you go - scroll, zoom, drag away.
58
 
59
+ <div class="crumbs">
60
+ Throughout this post, you'll find breadcrumb boxes like this one. They summarize what you just learned, connect it to the tenets, and point to what's coming <strong>Next</strong>. Think of them as narrative signposts to help you keep track.
61
+ </div>
62
+
63
  ## The core tenets of transformers
64
 
65
 
 
151
  What was the solution to this?
152
 
153
  <div class="crumbs">
154
+ Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
155
  </div>
156
 
157
 
 
178
  What does that gives us?
179
 
180
  <div class="crumbs">
181
+ A small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">tenet kept</a>). Reviewers and contributors maintain the shard, not the repetition. <strong>Next:</strong> the measurable effect on effective LOC and maintenance cost.
182
  </div>
183
 
184
 
 
207
  However, we were adding specific torch operations for each backend (sdpa, flash-attention iterations, flex attention) but it wasn't a [minimal user api](#minimal-user-api).
208
 
209
  <div class="crumbs">
210
+ Evidence: effective LOC drops ~15Γ— when counting shards instead of expanded modeling. Less to read, fewer places to break. Related cleanups: attention backends moved behind a function interface. <strong>Next:</strong> how the attention interface stays standard without hiding semantics.
211
  </div>
212
 
213
  ### <a id="attention-classes"></a> External Attention classes
 
238
 
239
 
240
  <div class="crumbs">
241
+ Semantics remain in <code>eager_attention_forward</code>; faster backends are opt-in via config. We inform via types/annotations rather than enforce rigid kwargs, preserving integrations. <strong>Next:</strong> distribution concerns are declared as a plan, not model surgery.
242
  </div>
243
 
244
  ### <a id="simpler-tensor-parallelism"></a> Configurable Tensor Parallelism
 
268
  Semantics stay in the model (a Linear stays a Linear), distribution is orthogonal and declared via strings: "colwise" splits columns of weights/bias across ranks; "rowwise" splits rows; packed variants shard fused weights; The mapping keys accept glob patterns like `layers.*.mlp.down_proj` to target repeated submodules.
269
 
270
  <div class="crumbs">
271
+ Sharding is configuration (<code>tp_plan</code>), not edits to <code>Linear</code>s. Glob patterns target repeated blocks; modeling semantics stay intact. <strong>Next:</strong> per-layer attention/caching schedules declared in config, not hardcoded.
272
  </div>
273
 
274
  ### <a id="layers-attentions-caches"></a> Layers, attentions and caches
 
301
  This is [minimal](#minimal-user-api) to implement on the user side, and allows to keep the modeling untouched. It is also easy to tweak.
302
 
303
  <div class="crumbs">
304
+ Allowed layer types are explicit; schedules (e.g., sliding/full alternation) live in config. This keeps the file readable and easy to tweak. <strong>Next:</strong> speedups come from kernels that don't change semantics.
305
  </div>
306
 
307
 
 
321
 
322
 
323
  <div class="crumbs">
324
+ Models define semantics; kernels define how to run them faster. Use annotations to borrow community forwards while keeping a consistent public surface. <strong>Next:</strong> what modularity looks like across the repo.
325
  </div>
326
 
327
  ## Modular developments
 
349
  How do we spot them, and how do we identify modularisable models?
350
 
351
  <div class="crumbs">
352
+ Graph reading guide: nodes are models; edges are modular imports. Llama-lineage is a hub; several VLMs remain islands β€” engineering opportunity for shared parents. <strong>Next:</strong> timeline + similarity signals to spot candidates.
353
  </div>
354
 
355
 
 
364
  If you've checked out llava, you've seen that llava_video is a red node, connected by a red edge to llava: it's a candidate, something that we can _likely_ remodularize, [not touching the actual model](#backwards-compatibility) but being much more readable with [DRY*](#do-repeat-yourself).
365
 
366
  <div class="crumbs">
367
+ Similarity (Jaccard; embeddings tried separately) surfaces likely parents; the timeline shows consolidation after modular landed. Red nodes/edges = candidates (e.g., <code>llava_video</code> β†’ <code>llava</code>) for refactors that preserve behavior. <strong>Next:</strong> concrete VLM choices that avoid leaky abstractions.
368
  </div>
369
 
370
  ### VLM improvements, avoiding abstraction
 
431
  But this is _within_ the modeling file, not in the `PreTrainedModel` base class. It will not move away from it, because it'd break the [self-contained logic](#one-model-one-file) of the model.
432
 
433
  <div class="crumbs">
434
+ Keep VLM embedding mix in the modeling file (semantics), standardize safe helpers (e.g., placeholder masking), don’t migrate behavior to <code>PreTrainedModel</code>. <strong>Next:</strong> pipeline-level wins that came from PyTorch-first choices (fast processors).
435
  </div>
436
 
437
 
 
445
  <p class="figure-legend">Thanks <a href="https://huggingface.co/yonigozlan">Yoni Gozlan</a> for the great work!</p>
446
 
447
  <div class="crumbs">
448
+ Torch-first lets processors assume torch/torchvision and run the whole pipeline on GPU; big per-model speedups. <strong>Next:</strong> how this lowers friction for contributors and downstream users.
449
  </div>
450
 
451
 
 
461
 
462
 
463
  <div class="crumbs">
464
+ The shape of a contribution: add a model (or variant) with a small modular shard; the community and serving stacks pick it up immediately. Popularity trends (encoders/embeddings) guide where we invest. <strong>Next:</strong> power tools enabled by a consistent API.
465
  </div>
466
 
467
 
 
479
  So, how do these design choices, these "tenets" influence development of models and overall usage of transformers?
480
 
481
  <div class="crumbs">
482
+ Encoders remain critical for embeddings and retrieval; maintaining them well benefits the broader ecosystem (e.g., Sentence Transformers, FAISS). <strong>Next:</strong> dev tools that leverage unified attention APIs and PyTorch-only internals.
483
  </div>
484
 
485
 
 
494
  {{{fragment-attention-visualizer}}}
495
 
496
  <div class="crumbs">
497
+ Uniform attention APIs enable cross-model diagnostics (e.g., PaliGemma prefix bidirectionality vs causal). <strong>Next:</strong> whole-model tracing for ports and regressions.
498
  </div>
499
 
500
 
 
508
 
509
 
510
  <div class="crumbs">
511
+ Forward interception and nested JSON logging align ports to reference implementations, reinforcing β€œSource of Truth.” <strong>Next:</strong> CUDA warmup reduces load-time stalls without touching modeling semantics.
512
  </div>
513
 
514
 
 
522
  It's hard to overstate how much of a lifesaver that is when you're trying to load a model as fast as possible, as it's the narrowest bottleneck for your iteration speed.
523
 
524
  <div class="crumbs">
525
+ Pre-allocating GPU memory removes malloc spikes (e.g., 7Γ— for 8B, 6Γ— for 32B in the referenced PR). <strong>Next:</strong> serving benefits directly from consistent interfaces and modularity.
526
  </div>
527
 
528
 
 
543
  Continuous batching is in itself very much linked to the great work of vLLM with the `paged attention kernel`, further justifying the facilitation of [external kernels](#community-kernels).
544
 
545
  <div class="crumbs">
546
+ OpenAI-compatible surface + continuous batching; kernels/backends slot in because the modeling API stayed stable. <strong>Next:</strong> reuse across vLLM/SGLang relies on the same consistency.
547
  </div>
548
 
549
 
 
560
 
561
 
562
  <div class="crumbs">
563
+ Being a good backend consumer requires a consistent public surface; modular shards and configs make that stability practical. <strong>Next:</strong> what changes in v5 without breaking the promise of visible semantics.
564
  </div>
565
 
566
  ## What is coming next
dist/index.html CHANGED
@@ -59,6 +59,9 @@
59
  <p><a href="#source-of-truth">Tenets exemplified</a> will have their summary available on hover.</p>
60
  <p><a href="https://huggingface.co/blog/welcome-openai-gpt-oss">External links</a> to articles will help you solidify your knowledge.</p>
61
  <p><a href="#generated-modeling">Several interactive visualisations</a> are available as you go - scroll, zoom, drag away.</p>
 
 
 
62
  <h2>The core tenets of transformers</h2>
63
  <p>We summarize the foundations on which we’ve built everything, and write the β€œtenets” of the library. They behave like <em>software interfaces</em>, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.</p>
64
  <p>Note that the library <em>evolved</em> towards these principles, and that they <em>emerged</em> from decisions taken, and once emerged they were recognized as critical.</p>
@@ -132,7 +135,7 @@
132
  <p>We needed to separate both principles that were so far intertwined, <a href="#do-repeat-yourself">repetition</a> and <a href="#one-model-one-file">hackabilty</a>.</p>
133
  <p>What was the solution to this?</p>
134
  <div class="crumbs">
135
- <strong>Breadcrumb</strong> β€” Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don’t Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). Next: how modular transformers honor these while removing boilerplate.
136
  </div>
137
  <h2><a id="modular"></a> Modular transformers</h2>
138
  <p>Transformers is an opiniated library. The previous <a href="https://huggingface.co/docs/transformers/en/philosophy">philosophy</a> page, and the <a href="https://huggingface.co/blog/transformers-design-philosophy">blog post</a> were already pointing at the drawbacks mentioned just above, which have been iteratively addressed. <a href="https://huggingface.co/docs/transformers/en/modular_transformers"><code>modular</code> transformers were introduced</a>, allowing a form of inheritance without breaking <a href="#one-model-one-file">One model, One file</a>.</p>
@@ -294,7 +297,7 @@ class GlmRMSNorm(nn.Module):
294
  <p>When <code>AutoModel.from_pretrained(...)</code> is called, it is indeed the modeling (right side) that is ran, and all the tests are run on the modeling code.</p>
295
  <p>What does that gives us?</p>
296
  <div class="crumbs">
297
- <strong>Breadcrumb</strong> β€” What changed: a small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">tenet kept</a>). Why it matters: reviewers and contributors maintain the shard, not the repetition. Next: the measurable effect on effective LOC and maintenance cost.
298
  </div>
299
  <h3>A maintainable control surface</h3>
300
  <p>The effect of modular can be measured straight from git history: at every commit, we look under the model directory.
@@ -310,7 +313,7 @@ However, if a model has a modular_<em>.py and a corresponding automatically gene
310
  <p>The <em>attention computation</em> itself happens at a <em>lower</em> level of abstraction than the model itself.</p>
311
  <p>However, we were adding specific torch operations for each backend (sdpa, flash-attention iterations, flex attention) but it wasn’t a <a href="#minimal-user-api">minimal user api</a>.</p>
312
  <div class="crumbs">
313
- <strong>Breadcrumb</strong> β€” Evidence: effective LOC drops ~15Γ— when counting shards instead of expanded modeling. Less to read, fewer places to break. Related cleanups: attention backends moved behind a function interface. Next: how the attention interface stays standard without hiding semantics.
314
  </div>
315
  <h3><a id="attention-classes"></a> External Attention classes</h3>
316
  <p>We moved to an <a href="https://huggingface.co/docs/transformers/en/attention_interface">attention interface</a> that allowed the following:</p>
@@ -328,7 +331,7 @@ if self.config._attn_implementation != &quot;eager&quot;:
328
  MyModelOutputAnnotated = Annotated[MyModelOutput, &quot;shape: (B, C, H, W)&quot;]
329
  </code></pre>
330
  <div class="crumbs">
331
- <strong>Breadcrumb</strong> β€” Semantics remain in <code>eager_attention_forward</code>; faster backends are opt-in via config. We inform via types/annotations rather than enforce rigid kwargs, preserving integrations. Next: distribution concerns are declared as a plan, not model surgery.
332
  </div>
333
  <h3><a id="simpler-tensor-parallelism"></a> Configurable Tensor Parallelism</h3>
334
  <p>If you’re not familiar with the different flavours of parallelism, I recommend checking out <a href="https://huggingface.co/blog/accelerate-nd-parallel">this blog post</a> first, and of course a full <a href="https://huggingface.co/spaces/nanotron/ultrascale-playbook">dive into the ultra-scale playbook</a> is always recommended.</p>
@@ -367,7 +370,7 @@ out = model(**inputs)</code></pre></p>
367
  <p><code>torchrun --nproc-per-node 4 demo.py</code></p>
368
  <p>Semantics stay in the model (a Linear stays a Linear), distribution is orthogonal and declared via strings: β€œcolwise” splits columns of weights/bias across ranks; β€œrowwise” splits rows; packed variants shard fused weights; The mapping keys accept glob patterns like <code>layers.*.mlp.down_proj</code> to target repeated submodules.</p>
369
  <div class="crumbs">
370
- <strong>Breadcrumb</strong> β€” Sharding is configuration (<code>tp_plan</code>), not edits to <code>Linear</code>s. Glob patterns target repeated blocks; modeling semantics stay intact. Next: per-layer attention/caching schedules declared in config, not hardcoded.
371
  </div>
372
  <h3><a id="layers-attentions-caches"></a> Layers, attentions and caches</h3>
373
  <p>Following the same logic, the <em>nature</em> of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we defined a mapping that can be then</p>
@@ -390,7 +393,7 @@ out = model(**inputs)</code></pre></p>
390
  </code></pre>
391
  <p>This is <a href="#minimal-user-api">minimal</a> to implement on the user side, and allows to keep the modeling untouched. It is also easy to tweak.</p>
392
  <div class="crumbs">
393
- <strong>Breadcrumb</strong> β€” Allowed layer types are explicit; schedules (e.g., sliding/full alternation) live in config. This keeps the file readable and easy to tweak. Next: speedups come from kernels that don’t change semantics.
394
  </div>
395
  <h3><a id="community-kernels"></a>Community Kernels</h3>
396
  <p>The same principle extends to normalization, activation, and other code paths. The model defines <strong>semantics</strong>; a kernel defines <strong>how</strong> to execute them faster. We annotate the module to borrow a community‑provided forward, keeping a <a href="#consistent-public-surface">consistent public surface</a></p>
@@ -401,7 +404,7 @@ class GlmRMSNorm(nn.Module):
401
  <p>Plus, this opened another angle of contribution for the community. People who are GPU whisperers can now contribute optimized kernels. You can check on the <a href="https://huggingface.co/blog/hello-hf-kernels">kernel community blog post</a> to learn more about it!</p>
402
  <p>Even more resources have been added, like the formidable <a href="https://github.com/huggingface/kernel-builder">kernel builder</a> with its connected resources to <a href="https://github.com/huggingface/kernel-builder/blob/main/docs/writing-kernels.md">help you build kernels with it</a> and <a href="https://github.com/huggingface/kernel-builder/blob/main/docs/nix.md">with nix</a>.</p>
403
  <div class="crumbs">
404
- <strong>Breadcrumb</strong> β€” Models define semantics; kernels define how to run them faster. Use annotations to borrow community forwards while keeping a consistent public surface. Next: what modularity looks like across the repo.
405
  </div>
406
  <h2>Modular developments</h2>
407
  <p>Now, we have a form of inheritance in our codebase. Some models become standards, and model contributors are given the opportunity to <em>define standards</em>. Pushing the boundaries of scientific knowledge can translate into the boundaries of engineering if this effort is made, and we’re striving for it.
@@ -421,7 +424,7 @@ As you can see, there is a small DETR island, a little llava pocket, and so on,
421
  <p>Another problem is, this is only for <code>modular</code> models. Several models do NOT have a modular file.</p>
422
  <p>How do we spot them, and how do we identify modularisable models?</p>
423
  <div class="crumbs">
424
- <strong>Breadcrumb</strong> β€” Graph reading guide: nodes are models; edges are modular imports. Llama-lineage is a hub; several VLMs remain islands β€” engineering opportunity for shared parents. Next: timeline + similarity signals to spot candidates.
425
  </div>
426
  <h3>Many models, but not enough yet, are alike</h3>
427
  <p>So I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together. I also used code embedding models to check out code similarities, and it yielded better results, for the needs of this blog post I will stick to Jaccard index.</p>
@@ -429,7 +432,7 @@ As you can see, there is a small DETR island, a little llava pocket, and so on,
429
  <p> <iframe src=https://molbap-timeline-1.hf.space style="width:100%; height:680px; border:0" allow="clipboard-read; clipboard-write; fullscreen" referrerpolicy=no-referrer-when-downgrade></iframe></p>
430
  <p>If you’ve checked out llava, you’ve seen that llava_video is a red node, connected by a red edge to llava: it’s a candidate, something that we can <em>likely</em> remodularize, <a href="#backwards-compatibility">not touching the actual model</a> but being much more readable with <a href="#do-repeat-yourself">DRY*</a>.</p>
431
  <div class="crumbs">
432
- <strong>Breadcrumb</strong> β€” Similarity (Jaccard; embeddings tried separately) surfaces likely parents; the timeline shows consolidation after modular landed. Red nodes/edges = candidates (e.g., <code>llava_video</code> β†’ <code>llava</code>) for refactors that preserve behavior. Next: concrete VLM choices that avoid leaky abstractions.
433
  </div>
434
  <h3>VLM improvements, avoiding abstraction</h3>
435
  <p>We don’t have cookbook for common VLM patterns (image token scatter, multi‑tower encoders, cross‑attn bridges). This is one of the main improvement points where we can work.</p>
@@ -483,7 +486,7 @@ As you can see, there is a small DETR island, a little llava pocket, and so on,
483
  </code></pre>
484
  <p>But this is <em>within</em> the modeling file, not in the <code>PreTrainedModel</code> base class. It will not move away from it, because it’d break the <a href="#one-model-one-file">self-contained logic</a> of the model.</p>
485
  <div class="crumbs">
486
- <strong>Breadcrumb</strong> β€” Keep VLM embedding mix in the modeling file (semantics), standardize safe helpers (e.g., placeholder masking), don’t migrate behavior to <code>PreTrainedModel</code>. Next: pipeline-level wins that came from PyTorch-first choices (fast processors).
487
  </div>
488
  <h3>On image processing and processors</h3>
489
  <p>Choosing to be a <code>torch</code>-first software meant relieving a tremendous amount of support from <code>jax </code> and <code>TensorFlow</code> , and it also meant that we could be more lenient into the amount of torch-dependent utilities that we were able to add. One of these is the <em>fast processing</em> of images. Where they were before assumed to be minimal ndarrays, making stronger assumptions and enforcing <code>torch</code> and <code>torchvision</code>native inputs allowed up to speed up massively the processing time for each model.</p>
@@ -491,7 +494,7 @@ As you can see, there is a small DETR island, a little llava pocket, and so on,
491
  <p><img src="static/fast_image_processors.png" alt="Fast Image Processors Performance"></p>
492
  <p class="figure-legend">Thanks <a href="https://huggingface.co/yonigozlan">Yoni Gozlan</a> for the great work!</p>
493
  <div class="crumbs">
494
- <strong>Breadcrumb</strong> β€” Torch-first lets processors assume torch/torchvision and run the whole pipeline on GPU; big per-model speedups. Next: how this lowers friction for contributors and downstream users.
495
  </div>
496
  <h2>Reduce barrier to entry/contribution</h2>
497
  <p>This is an overall objective: there’s no <code>transformers</code> without its community.</p>
@@ -499,7 +502,7 @@ As you can see, there is a small DETR island, a little llava pocket, and so on,
499
  <p>Among the most valuable contributions to <code>transformers</code> is of course the addition of new models. Very recently, <a href="https://huggingface.co/blog/welcome-openai-gpt-oss">OpenAI added GPT-OSS</a>, which prompted the addition of many new features to the library in order to support <a href="https://huggingface.co/openai/gpt-oss-120b">their model</a>.</p>
500
  <p>A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for <a href="https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b">gpt-oss 120b</a>, despite its size!</p>
501
  <div class="crumbs">
502
- <strong>Breadcrumb</strong> β€” The shape of a contribution: add a model (or variant) with a small modular shard; the community and serving stacks pick it up immediately. Popularity trends (encoders/embeddings) guide where we invest. Next: power tools enabled by a consistent API.
503
  </div>
504
  <h3><a id="encoders-ftw"></a> Models popularity</h3>
505
  <p>Talking about dependencies, we can take a look at the number of downloads for transformer models popularity. One thing we see is the prominence of encoders: This is because the usage of encoders lies in embeddings, just check out <a href="https://huggingface.co/blog/embeddinggemma">EmbeddingGemma</a> for a modern recap. Hence, it is vital to keep the encoders part viable, usable, fine-tune-able.</p>
@@ -4392,7 +4395,7 @@ return Plotly;
4392
  <p>In that regard, we DO want to be a modular toolbox, being <a href="#minimal-user-api">minimal</a> enough and well documented enough so any ML/AI developer can use <code>transformers</code> without having to think about it. We aim to reduce the cognitive load brought about by model development, not increase it.</p>
4393
  <p>So, how do these design choices, these β€œtenets” influence development of models and overall usage of transformers?</p>
4394
  <div class="crumbs">
4395
- <strong>Breadcrumb</strong> β€” Encoders remain critical for embeddings and retrieval; maintaining them well benefits the broader ecosystem (e.g., Sentence Transformers, FAISS). Next: dev tools that leverage unified attention APIs and PyTorch-only internals.
4396
  </div>
4397
  <h2>A surgical toolbox for model development</h2>
4398
  <h3>Attention visualisation</h3>
@@ -4444,14 +4447,14 @@ return Plotly;
4444
  </div>
4445
  </p>
4446
  <div class="crumbs">
4447
- <strong>Breadcrumb</strong> β€” Uniform attention APIs enable cross-model diagnostics (e.g., PaliGemma prefix bidirectionality vs causal). Next: whole-model tracing for ports and regressions.
4448
  </div>
4449
  <h3>Logging entire model activations</h3>
4450
  <p>Further, because it is all PyTorch (and it is even more now that we support only PyTorch), we can easily <a href="https://huggingface.co/docs/transformers/internal/model_debugging_utils">debug any model</a> when we want to add it to transformers. We now have a power-user tool for porting or adding models, that wraps a forward pass, intercepts every submodule call, and logs shapes, dtypes, and sample statistics of inputs/outputs to nested JSON.</p>
4451
  <p>It just works with PyTorch models and is especially useful when aligning outputs with a reference implementation, aligned with our <a href="#source-of-truth">core guideline</a>.</p>
4452
  <p><img src="static/model_debugger.png" alt="Model debugger interface"></p>
4453
  <div class="crumbs">
4454
- <strong>Breadcrumb</strong> β€” Forward interception and nested JSON logging align ports to reference implementations, reinforcing β€œSource of Truth.” Next: CUDA warmup reduces load-time stalls without touching modeling semantics.
4455
  </div>
4456
  <h3>Cooking faster CUDA warmups</h3>
4457
  <p>Having a clean <em>external</em> API allows us to work on the <a href="#code-is-product">true inner workings of transformers</a>. One of the few recent additions was the <em>CUDA warmup</em> via <code>caching_allocator_warmup</code> which improved massively the loading footprint by pre-allocating GPU memory to avoid malloc bottlenecks during model loading, achieving a 7x factor for an 8B model, 6x for a 32B, you can check out <a href="https://github.com/huggingface/transformers/pull/36380">the source</a>!</p>
@@ -4508,7 +4511,7 @@ return Plotly;
4508
  <script>let animationSpeed=1/2.4,isRunning=!1,totalLayers=10;function startDemo(){isRunning||(isRunning=!0,document.getElementById("startBtn").disabled=!0,document.getElementById("resetBtn").disabled=!0,Promise.all([animateNoWarmup(),animateWithWarmup()]).then(()=>{isRunning=!1,document.getElementById("startBtn").disabled=!1,document.getElementById("resetBtn").disabled=!1}))}function resetDemo(){isRunning||(document.getElementById("noWarmupArea").innerHTML="",document.getElementById("warmupLayers").innerHTML="",document.getElementById("warmupFill").style.width="0%",document.getElementById("warmupContainer").classList.remove("allocated"),document.getElementById("noWarmupTime").textContent="0.00s",document.getElementById("warmupTime").textContent="0.00s",document.getElementById("noWarmupCounter").textContent="Layers loaded: 0/10",document.getElementById("warmupCounter").textContent="Layers loaded: 0/10",document.getElementById("noWarmupPhase").textContent="",document.getElementById("warmupPhase").textContent="")}async function animateNoWarmup(){let e=document.getElementById("noWarmupArea"),t=document.getElementById("noWarmupTime"),n=document.getElementById("noWarmupCounter"),a=document.getElementById("noWarmupPhase"),m=0,o=200/animationSpeed;a.textContent="Loading model layers...";for(let a=0;a<10;a++){let d=document.createElement("div");d.className="layer-box",e.appendChild(d),await sleep(.3*o),d.classList.add("allocating"),t.textContent=(m+=.08).toFixed(2)+"s",await sleep(.7*o),d.classList.remove("allocating"),d.classList.add("loaded"),t.textContent=(m+=.12).toFixed(2)+"s",n.textContent=`Layers loaded: ${a+1}/10`}a.textContent="Complete!"}async function animateWithWarmup(){let e=document.getElementById("warmupLayers"),t=document.getElementById("warmupTime"),n=document.getElementById("warmupCounter"),a=document.getElementById("warmupPhase"),m=document.getElementById("warmupContainer"),o=document.getElementById("warmupFill"),d=0,l=200/animationSpeed;a.textContent="Warming up allocator...",await sleep(2*l),m.classList.add("allocated"),t.textContent=(d+=.3).toFixed(2)+"s",a.textContent="Loading model layers...";for(let a=0;a<10;a++){let m=document.createElement("div");m.className="layer-box loaded",m.style.width="40px",m.style.height="20px",e.appendChild(m);let i=(a+1)/10*100;o.style.width=i+"%",await sleep(.5*l),t.textContent=(d+=.08).toFixed(2)+"s",n.textContent=`Layers loaded: ${a+1}/10`}a.textContent="Complete!"}function sleep(e){return new Promise(t=>setTimeout(t,e))}</script></p>
4509
  <p>It’s hard to overstate how much of a lifesaver that is when you’re trying to load a model as fast as possible, as it’s the narrowest bottleneck for your iteration speed.</p>
4510
  <div class="crumbs">
4511
- <strong>Breadcrumb</strong> β€” Pre-allocating GPU memory removes malloc spikes (e.g., 7Γ— for 8B, 6Γ— for 32B in the referenced PR). Next: serving benefits directly from consistent interfaces and modularity.
4512
  </div>
4513
  <h3>Transformers-serve and continuous batching</h3>
4514
  <p>Having all these models readily available allows to use all of them with transformers-serve, and enable interfacing with them with an Open API-like pattern. As a reminder, the hub also opens access to various <a href="https://huggingface.co/docs/inference-providers/en/index">inference providers</a> if you’re interested in model deployment in general.</p>
@@ -4521,7 +4524,7 @@ curl -X POST http://localhost:8000/v1/chat/completions \
4521
  <p>This provides an OpenAI-compatible API with features like <a href="https://github.com/huggingface/transformers/pull/38085">continuous batching</a> (also check <a href="https://github.com/huggingface/transformers/pull/40426">here</a>) for better GPU utilization.</p>
4522
  <p>Continuous batching is in itself very much linked to the great work of vLLM with the <code>paged attention kernel</code>, further justifying the facilitation of <a href="#community-kernels">external kernels</a>.</p>
4523
  <div class="crumbs">
4524
- <strong>Breadcrumb</strong> β€” OpenAI-compatible surface + continuous batching; kernels/backends slot in because the modeling API stayed stable. Next: reuse across vLLM/SGLang relies on the same consistency.
4525
  </div>
4526
  <h2>Community reusability</h2>
4527
  <p>Transformers-serve is transformers-first, for sure, but the library is made first and foremost to be <em>reused</em> at large by the open-source ecosystem.</p>
@@ -4532,7 +4535,7 @@ curl -X POST http://localhost:8000/v1/chat/completions \
4532
  </ul>
4533
  <p>This cements the need even more for a <a href="#consistent-public-surface">consistent public surface</a>: we are now a backend, and there’s more optimized software than us to handle serving. At the time of writing, more effort is done in that direction. We already have compatible configs for VLMs for vLLM (say that three times fast), <a href="https://github.com/huggingface/transformers/pull/40696/files">here for GLM4 video support</a>, and here for <a href="https://github.com/huggingface/transformers/pull/40132">MoE support</a> for instance.</p>
4534
  <div class="crumbs">
4535
- <strong>Breadcrumb</strong> β€” Being a good backend consumer requires a consistent public surface; modular shards and configs make that stability practical. Next: what changes in v5 without breaking the promise of visible semantics.
4536
  </div>
4537
  <h2>What is coming next</h2>
4538
  <p>The next major version of <code>transformers</code> is just around the corner. When v5 is releasd, <a href="#backwards-compatibility">backwards compatibility</a> will try to stay as solid as possible. Changes we do now are to ensure this.</p>
 
59
  <p><a href="#source-of-truth">Tenets exemplified</a> will have their summary available on hover.</p>
60
  <p><a href="https://huggingface.co/blog/welcome-openai-gpt-oss">External links</a> to articles will help you solidify your knowledge.</p>
61
  <p><a href="#generated-modeling">Several interactive visualisations</a> are available as you go - scroll, zoom, drag away.</p>
62
+ <div class="crumbs">
63
+ Throughout this post, you'll find breadcrumb boxes like this one. They summarize what you just learned, connect it to the tenets, and point to what's coming <strong>Next</strong>. Think of them as narrative signposts to help you keep track.
64
+ </div>
65
  <h2>The core tenets of transformers</h2>
66
  <p>We summarize the foundations on which we’ve built everything, and write the β€œtenets” of the library. They behave like <em>software interfaces</em>, hence it is crucial that they are explicitly written down. However opinionated they are, they have evolved over time.</p>
67
  <p>Note that the library <em>evolved</em> towards these principles, and that they <em>emerged</em> from decisions taken, and once emerged they were recognized as critical.</p>
 
135
  <p>We needed to separate both principles that were so far intertwined, <a href="#do-repeat-yourself">repetition</a> and <a href="#one-model-one-file">hackabilty</a>.</p>
136
  <p>What was the solution to this?</p>
137
  <div class="crumbs">
138
+ Read the code in one place (<a href="#one-model-one-file">One Model, One File</a>). Keep semantics local (<a href="#standardize-dont-abstract">Standardize, Don't Abstract</a>). Allow strategic duplication for end users (<a href="#do-repeat-yourself">DRY*</a>). Keep the public surface minimal and stable (<a href="#minimal-user-api">Minimal API</a>, <a href="#backwards-compatibility">Backwards Compatibility</a>, <a href="#consistent-public-surface">Consistent Surface</a>). <strong>Next:</strong> how modular transformers honor these while removing boilerplate.
139
  </div>
140
  <h2><a id="modular"></a> Modular transformers</h2>
141
  <p>Transformers is an opiniated library. The previous <a href="https://huggingface.co/docs/transformers/en/philosophy">philosophy</a> page, and the <a href="https://huggingface.co/blog/transformers-design-philosophy">blog post</a> were already pointing at the drawbacks mentioned just above, which have been iteratively addressed. <a href="https://huggingface.co/docs/transformers/en/modular_transformers"><code>modular</code> transformers were introduced</a>, allowing a form of inheritance without breaking <a href="#one-model-one-file">One model, One file</a>.</p>
 
297
  <p>When <code>AutoModel.from_pretrained(...)</code> is called, it is indeed the modeling (right side) that is ran, and all the tests are run on the modeling code.</p>
298
  <p>What does that gives us?</p>
299
  <div class="crumbs">
300
+ A small <code>modular_*.py</code> declares reuse; the expanded modeling file stays visible (<a href="#one-model-one-file">tenet kept</a>). Reviewers and contributors maintain the shard, not the repetition. <strong>Next:</strong> the measurable effect on effective LOC and maintenance cost.
301
  </div>
302
  <h3>A maintainable control surface</h3>
303
  <p>The effect of modular can be measured straight from git history: at every commit, we look under the model directory.
 
313
  <p>The <em>attention computation</em> itself happens at a <em>lower</em> level of abstraction than the model itself.</p>
314
  <p>However, we were adding specific torch operations for each backend (sdpa, flash-attention iterations, flex attention) but it wasn’t a <a href="#minimal-user-api">minimal user api</a>.</p>
315
  <div class="crumbs">
316
+ Evidence: effective LOC drops ~15Γ— when counting shards instead of expanded modeling. Less to read, fewer places to break. Related cleanups: attention backends moved behind a function interface. <strong>Next:</strong> how the attention interface stays standard without hiding semantics.
317
  </div>
318
  <h3><a id="attention-classes"></a> External Attention classes</h3>
319
  <p>We moved to an <a href="https://huggingface.co/docs/transformers/en/attention_interface">attention interface</a> that allowed the following:</p>
 
331
  MyModelOutputAnnotated = Annotated[MyModelOutput, &quot;shape: (B, C, H, W)&quot;]
332
  </code></pre>
333
  <div class="crumbs">
334
+ Semantics remain in <code>eager_attention_forward</code>; faster backends are opt-in via config. We inform via types/annotations rather than enforce rigid kwargs, preserving integrations. <strong>Next:</strong> distribution concerns are declared as a plan, not model surgery.
335
  </div>
336
  <h3><a id="simpler-tensor-parallelism"></a> Configurable Tensor Parallelism</h3>
337
  <p>If you’re not familiar with the different flavours of parallelism, I recommend checking out <a href="https://huggingface.co/blog/accelerate-nd-parallel">this blog post</a> first, and of course a full <a href="https://huggingface.co/spaces/nanotron/ultrascale-playbook">dive into the ultra-scale playbook</a> is always recommended.</p>
 
370
  <p><code>torchrun --nproc-per-node 4 demo.py</code></p>
371
  <p>Semantics stay in the model (a Linear stays a Linear), distribution is orthogonal and declared via strings: β€œcolwise” splits columns of weights/bias across ranks; β€œrowwise” splits rows; packed variants shard fused weights; The mapping keys accept glob patterns like <code>layers.*.mlp.down_proj</code> to target repeated submodules.</p>
372
  <div class="crumbs">
373
+ Sharding is configuration (<code>tp_plan</code>), not edits to <code>Linear</code>s. Glob patterns target repeated blocks; modeling semantics stay intact. <strong>Next:</strong> per-layer attention/caching schedules declared in config, not hardcoded.
374
  </div>
375
  <h3><a id="layers-attentions-caches"></a> Layers, attentions and caches</h3>
376
  <p>Following the same logic, the <em>nature</em> of attention and caching per layer of a model should not be hardcoded. We should be able to specify in a configuration-based fashion how each layer is implemented. Thus we defined a mapping that can be then</p>
 
393
  </code></pre>
394
  <p>This is <a href="#minimal-user-api">minimal</a> to implement on the user side, and allows to keep the modeling untouched. It is also easy to tweak.</p>
395
  <div class="crumbs">
396
+ Allowed layer types are explicit; schedules (e.g., sliding/full alternation) live in config. This keeps the file readable and easy to tweak. <strong>Next:</strong> speedups come from kernels that don't change semantics.
397
  </div>
398
  <h3><a id="community-kernels"></a>Community Kernels</h3>
399
  <p>The same principle extends to normalization, activation, and other code paths. The model defines <strong>semantics</strong>; a kernel defines <strong>how</strong> to execute them faster. We annotate the module to borrow a community‑provided forward, keeping a <a href="#consistent-public-surface">consistent public surface</a></p>
 
404
  <p>Plus, this opened another angle of contribution for the community. People who are GPU whisperers can now contribute optimized kernels. You can check on the <a href="https://huggingface.co/blog/hello-hf-kernels">kernel community blog post</a> to learn more about it!</p>
405
  <p>Even more resources have been added, like the formidable <a href="https://github.com/huggingface/kernel-builder">kernel builder</a> with its connected resources to <a href="https://github.com/huggingface/kernel-builder/blob/main/docs/writing-kernels.md">help you build kernels with it</a> and <a href="https://github.com/huggingface/kernel-builder/blob/main/docs/nix.md">with nix</a>.</p>
406
  <div class="crumbs">
407
+ Models define semantics; kernels define how to run them faster. Use annotations to borrow community forwards while keeping a consistent public surface. <strong>Next:</strong> what modularity looks like across the repo.
408
  </div>
409
  <h2>Modular developments</h2>
410
  <p>Now, we have a form of inheritance in our codebase. Some models become standards, and model contributors are given the opportunity to <em>define standards</em>. Pushing the boundaries of scientific knowledge can translate into the boundaries of engineering if this effort is made, and we’re striving for it.
 
424
  <p>Another problem is, this is only for <code>modular</code> models. Several models do NOT have a modular file.</p>
425
  <p>How do we spot them, and how do we identify modularisable models?</p>
426
  <div class="crumbs">
427
+ Graph reading guide: nodes are models; edges are modular imports. Llama-lineage is a hub; several VLMs remain islands β€” engineering opportunity for shared parents. <strong>Next:</strong> timeline + similarity signals to spot candidates.
428
  </div>
429
  <h3>Many models, but not enough yet, are alike</h3>
430
  <p>So I looked into Jaccard similarity, which we use to measure set differences. I know that code is more than a set of characters stringed together. I also used code embedding models to check out code similarities, and it yielded better results, for the needs of this blog post I will stick to Jaccard index.</p>
 
432
  <p> <iframe src=https://molbap-timeline-1.hf.space style="width:100%; height:680px; border:0" allow="clipboard-read; clipboard-write; fullscreen" referrerpolicy=no-referrer-when-downgrade></iframe></p>
433
  <p>If you’ve checked out llava, you’ve seen that llava_video is a red node, connected by a red edge to llava: it’s a candidate, something that we can <em>likely</em> remodularize, <a href="#backwards-compatibility">not touching the actual model</a> but being much more readable with <a href="#do-repeat-yourself">DRY*</a>.</p>
434
  <div class="crumbs">
435
+ Similarity (Jaccard; embeddings tried separately) surfaces likely parents; the timeline shows consolidation after modular landed. Red nodes/edges = candidates (e.g., <code>llava_video</code> β†’ <code>llava</code>) for refactors that preserve behavior. <strong>Next:</strong> concrete VLM choices that avoid leaky abstractions.
436
  </div>
437
  <h3>VLM improvements, avoiding abstraction</h3>
438
  <p>We don’t have cookbook for common VLM patterns (image token scatter, multi‑tower encoders, cross‑attn bridges). This is one of the main improvement points where we can work.</p>
 
486
  </code></pre>
487
  <p>But this is <em>within</em> the modeling file, not in the <code>PreTrainedModel</code> base class. It will not move away from it, because it’d break the <a href="#one-model-one-file">self-contained logic</a> of the model.</p>
488
  <div class="crumbs">
489
+ Keep VLM embedding mix in the modeling file (semantics), standardize safe helpers (e.g., placeholder masking), don’t migrate behavior to <code>PreTrainedModel</code>. <strong>Next:</strong> pipeline-level wins that came from PyTorch-first choices (fast processors).
490
  </div>
491
  <h3>On image processing and processors</h3>
492
  <p>Choosing to be a <code>torch</code>-first software meant relieving a tremendous amount of support from <code>jax </code> and <code>TensorFlow</code> , and it also meant that we could be more lenient into the amount of torch-dependent utilities that we were able to add. One of these is the <em>fast processing</em> of images. Where they were before assumed to be minimal ndarrays, making stronger assumptions and enforcing <code>torch</code> and <code>torchvision</code>native inputs allowed up to speed up massively the processing time for each model.</p>
 
494
  <p><img src="static/fast_image_processors.png" alt="Fast Image Processors Performance"></p>
495
  <p class="figure-legend">Thanks <a href="https://huggingface.co/yonigozlan">Yoni Gozlan</a> for the great work!</p>
496
  <div class="crumbs">
497
+ Torch-first lets processors assume torch/torchvision and run the whole pipeline on GPU; big per-model speedups. <strong>Next:</strong> how this lowers friction for contributors and downstream users.
498
  </div>
499
  <h2>Reduce barrier to entry/contribution</h2>
500
  <p>This is an overall objective: there’s no <code>transformers</code> without its community.</p>
 
502
  <p>Among the most valuable contributions to <code>transformers</code> is of course the addition of new models. Very recently, <a href="https://huggingface.co/blog/welcome-openai-gpt-oss">OpenAI added GPT-OSS</a>, which prompted the addition of many new features to the library in order to support <a href="https://huggingface.co/openai/gpt-oss-120b">their model</a>.</p>
503
  <p>A second one is the ability to fine-tune and pipeline these models into many other softwares. Check here on the hub how many finetunes are registered for <a href="https://huggingface.co/models?other=base_model:finetune:openai/gpt-oss-120b">gpt-oss 120b</a>, despite its size!</p>
504
  <div class="crumbs">
505
+ The shape of a contribution: add a model (or variant) with a small modular shard; the community and serving stacks pick it up immediately. Popularity trends (encoders/embeddings) guide where we invest. <strong>Next:</strong> power tools enabled by a consistent API.
506
  </div>
507
  <h3><a id="encoders-ftw"></a> Models popularity</h3>
508
  <p>Talking about dependencies, we can take a look at the number of downloads for transformer models popularity. One thing we see is the prominence of encoders: This is because the usage of encoders lies in embeddings, just check out <a href="https://huggingface.co/blog/embeddinggemma">EmbeddingGemma</a> for a modern recap. Hence, it is vital to keep the encoders part viable, usable, fine-tune-able.</p>
 
4395
  <p>In that regard, we DO want to be a modular toolbox, being <a href="#minimal-user-api">minimal</a> enough and well documented enough so any ML/AI developer can use <code>transformers</code> without having to think about it. We aim to reduce the cognitive load brought about by model development, not increase it.</p>
4396
  <p>So, how do these design choices, these β€œtenets” influence development of models and overall usage of transformers?</p>
4397
  <div class="crumbs">
4398
+ Encoders remain critical for embeddings and retrieval; maintaining them well benefits the broader ecosystem (e.g., Sentence Transformers, FAISS). <strong>Next:</strong> dev tools that leverage unified attention APIs and PyTorch-only internals.
4399
  </div>
4400
  <h2>A surgical toolbox for model development</h2>
4401
  <h3>Attention visualisation</h3>
 
4447
  </div>
4448
  </p>
4449
  <div class="crumbs">
4450
+ Uniform attention APIs enable cross-model diagnostics (e.g., PaliGemma prefix bidirectionality vs causal). <strong>Next:</strong> whole-model tracing for ports and regressions.
4451
  </div>
4452
  <h3>Logging entire model activations</h3>
4453
  <p>Further, because it is all PyTorch (and it is even more now that we support only PyTorch), we can easily <a href="https://huggingface.co/docs/transformers/internal/model_debugging_utils">debug any model</a> when we want to add it to transformers. We now have a power-user tool for porting or adding models, that wraps a forward pass, intercepts every submodule call, and logs shapes, dtypes, and sample statistics of inputs/outputs to nested JSON.</p>
4454
  <p>It just works with PyTorch models and is especially useful when aligning outputs with a reference implementation, aligned with our <a href="#source-of-truth">core guideline</a>.</p>
4455
  <p><img src="static/model_debugger.png" alt="Model debugger interface"></p>
4456
  <div class="crumbs">
4457
+ Forward interception and nested JSON logging align ports to reference implementations, reinforcing β€œSource of Truth.” <strong>Next:</strong> CUDA warmup reduces load-time stalls without touching modeling semantics.
4458
  </div>
4459
  <h3>Cooking faster CUDA warmups</h3>
4460
  <p>Having a clean <em>external</em> API allows us to work on the <a href="#code-is-product">true inner workings of transformers</a>. One of the few recent additions was the <em>CUDA warmup</em> via <code>caching_allocator_warmup</code> which improved massively the loading footprint by pre-allocating GPU memory to avoid malloc bottlenecks during model loading, achieving a 7x factor for an 8B model, 6x for a 32B, you can check out <a href="https://github.com/huggingface/transformers/pull/36380">the source</a>!</p>
 
4511
  <script>let animationSpeed=1/2.4,isRunning=!1,totalLayers=10;function startDemo(){isRunning||(isRunning=!0,document.getElementById("startBtn").disabled=!0,document.getElementById("resetBtn").disabled=!0,Promise.all([animateNoWarmup(),animateWithWarmup()]).then(()=>{isRunning=!1,document.getElementById("startBtn").disabled=!1,document.getElementById("resetBtn").disabled=!1}))}function resetDemo(){isRunning||(document.getElementById("noWarmupArea").innerHTML="",document.getElementById("warmupLayers").innerHTML="",document.getElementById("warmupFill").style.width="0%",document.getElementById("warmupContainer").classList.remove("allocated"),document.getElementById("noWarmupTime").textContent="0.00s",document.getElementById("warmupTime").textContent="0.00s",document.getElementById("noWarmupCounter").textContent="Layers loaded: 0/10",document.getElementById("warmupCounter").textContent="Layers loaded: 0/10",document.getElementById("noWarmupPhase").textContent="",document.getElementById("warmupPhase").textContent="")}async function animateNoWarmup(){let e=document.getElementById("noWarmupArea"),t=document.getElementById("noWarmupTime"),n=document.getElementById("noWarmupCounter"),a=document.getElementById("noWarmupPhase"),m=0,o=200/animationSpeed;a.textContent="Loading model layers...";for(let a=0;a<10;a++){let d=document.createElement("div");d.className="layer-box",e.appendChild(d),await sleep(.3*o),d.classList.add("allocating"),t.textContent=(m+=.08).toFixed(2)+"s",await sleep(.7*o),d.classList.remove("allocating"),d.classList.add("loaded"),t.textContent=(m+=.12).toFixed(2)+"s",n.textContent=`Layers loaded: ${a+1}/10`}a.textContent="Complete!"}async function animateWithWarmup(){let e=document.getElementById("warmupLayers"),t=document.getElementById("warmupTime"),n=document.getElementById("warmupCounter"),a=document.getElementById("warmupPhase"),m=document.getElementById("warmupContainer"),o=document.getElementById("warmupFill"),d=0,l=200/animationSpeed;a.textContent="Warming up allocator...",await sleep(2*l),m.classList.add("allocated"),t.textContent=(d+=.3).toFixed(2)+"s",a.textContent="Loading model layers...";for(let a=0;a<10;a++){let m=document.createElement("div");m.className="layer-box loaded",m.style.width="40px",m.style.height="20px",e.appendChild(m);let i=(a+1)/10*100;o.style.width=i+"%",await sleep(.5*l),t.textContent=(d+=.08).toFixed(2)+"s",n.textContent=`Layers loaded: ${a+1}/10`}a.textContent="Complete!"}function sleep(e){return new Promise(t=>setTimeout(t,e))}</script></p>
4512
  <p>It’s hard to overstate how much of a lifesaver that is when you’re trying to load a model as fast as possible, as it’s the narrowest bottleneck for your iteration speed.</p>
4513
  <div class="crumbs">
4514
+ Pre-allocating GPU memory removes malloc spikes (e.g., 7Γ— for 8B, 6Γ— for 32B in the referenced PR). <strong>Next:</strong> serving benefits directly from consistent interfaces and modularity.
4515
  </div>
4516
  <h3>Transformers-serve and continuous batching</h3>
4517
  <p>Having all these models readily available allows to use all of them with transformers-serve, and enable interfacing with them with an Open API-like pattern. As a reminder, the hub also opens access to various <a href="https://huggingface.co/docs/inference-providers/en/index">inference providers</a> if you’re interested in model deployment in general.</p>
 
4524
  <p>This provides an OpenAI-compatible API with features like <a href="https://github.com/huggingface/transformers/pull/38085">continuous batching</a> (also check <a href="https://github.com/huggingface/transformers/pull/40426">here</a>) for better GPU utilization.</p>
4525
  <p>Continuous batching is in itself very much linked to the great work of vLLM with the <code>paged attention kernel</code>, further justifying the facilitation of <a href="#community-kernels">external kernels</a>.</p>
4526
  <div class="crumbs">
4527
+ OpenAI-compatible surface + continuous batching; kernels/backends slot in because the modeling API stayed stable. <strong>Next:</strong> reuse across vLLM/SGLang relies on the same consistency.
4528
  </div>
4529
  <h2>Community reusability</h2>
4530
  <p>Transformers-serve is transformers-first, for sure, but the library is made first and foremost to be <em>reused</em> at large by the open-source ecosystem.</p>
 
4535
  </ul>
4536
  <p>This cements the need even more for a <a href="#consistent-public-surface">consistent public surface</a>: we are now a backend, and there’s more optimized software than us to handle serving. At the time of writing, more effort is done in that direction. We already have compatible configs for VLMs for vLLM (say that three times fast), <a href="https://github.com/huggingface/transformers/pull/40696/files">here for GLM4 video support</a>, and here for <a href="https://github.com/huggingface/transformers/pull/40132">MoE support</a> for instance.</p>
4537
  <div class="crumbs">
4538
+ Being a good backend consumer requires a consistent public surface; modular shards and configs make that stability practical. <strong>Next:</strong> what changes in v5 without breaking the promise of visible semantics.
4539
  </div>
4540
  <h2>What is coming next</h2>
4541
  <p>The next major version of <code>transformers</code> is just around the corner. When v5 is releasd, <a href="#backwards-compatibility">backwards compatibility</a> will try to stay as solid as possible. Changes we do now are to ensure this.</p>
dist/main.bundle.js CHANGED
@@ -1780,6 +1780,37 @@ a[data-tooltip]:hover:before {
1780
  visibility: visible;
1781
  }
1782
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1783
  /* Improve blockquote styling */
1784
  d-article blockquote {
1785
  font-size: 19px;
@@ -1853,7 +1884,7 @@ d-article .memory-chart-container {
1853
  }
1854
  }
1855
 
1856
- `, "",{"version":3,"sources":["webpack://./src/transformers-custom.css"],"names":[],"mappings":"AAAA,4CAA4C;;AAE5C,2BAA2B;AAC3B;IACI,aAAa;IACb,8BAA8B;IAC9B,WAAW;IACX,cAAc;IACd,kBAAkB;AACtB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,gBAAgB;IAChB,wCAAwC;AAC5C;;AAEA;IACI,mBAAmB;IACnB,qBAAqB;IACrB,gBAAgB;IAChB,cAAc;IACd,gCAAgC;IAChC,gBAAgB;AACpB;;AAEA;IACI,SAAS;IACT,aAAa;IACb,mBAAmB;IACnB,gBAAgB;IAChB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA;IACI,cAAc;AAClB;;AAEA,8CAA8C;AAC9C;IACI;QACI,0BAA0B;QAC1B,SAAS;IACb;AACJ;;AAEA,+DAA+D;AAC/D;IACI,cAAc;AAClB;;AAEA;IACI,+BAA+B,EAAE,iBAAiB;IAClD,gBAAgB;IAChB,eAAe;IACf,aAAa;IACb,0BAA0B;IAC1B,WAAW;IACX,gBAAgB;IAChB,cAAc;AAClB;;AAEA;IACI,gCAAgC;IAChC,6DAA6D;IAC7D,yBAAyB;IACzB,mBAAmB;IACnB,4BAA4B;IAC5B,SAAS;IACT,kBAAkB;IAClB,2CAA2C;IAC3C,yBAAyB;IACzB,eAAe;AACnB;;AAEA;IACI,uCAAuC;IACvC,2CAA2C;IAC3C,oCAAoC;IACpC,6DAA6D;AACjE;;AAEA,8BAA8B;AAC9B,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;;AAE1G;IACI,+BAA+B;IAC/B,kBAAkB;IAClB,UAAU;IACV,WAAW;IACX,YAAY;IACZ,WAAW;IACX,YAAY;IACZ,kBAAkB;IAClB,aAAa;IACb,mBAAmB;IACnB,uBAAuB;IACvB,gBAAgB;IAChB,iBAAiB;IACjB,0CAA0C;IAC1C,uBAAuB;AAC3B;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,iBAAiB;IACjB,kBAAkB;IAClB,cAAc;IACd,mBAAmB;IACnB,aAAa;IACb,+BAA+B;IAC/B,kBAAkB;IAClB,8BAA8B;AAClC;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,gBAAgB;AACpB;;AAEA,iDAAiD;AACjD;IACI,KAAK,0CAA0C,EAAE;IACjD,MAAM,0CAA0C,EAAE;IAClD,OAAO,0CAA0C,EAAE;AACvD;;AAEA;IACI,6CAA6C;AACjD;;AAEA,kCAAkC;AAClC;IACI,yBAAyB;IACzB,mBAAmB;IACnB,mBAAmB;IACnB,cAAc;IACd,gBAAgB;IAChB,yCAAyC;AAC7C;;AAEA,yCAAyC;AACzC;IACI,6BAA6B;IAC7B,mCAAmC;AACvC;;AAEA;IACI,6DAA6D;IAC7D,YAAY;IACZ,oBAAoB;IACpB,gBAAgB;AACpB;;AAEA;IACI,eAAe;AACnB;;AAEA;IACI,mBAAmB;IACnB,oBAAoB;IACpB,6BAA6B;IAC7B,cAAc;IACd,gBAAgB;AACpB;;AAEA,4CAA4C;AAC5C;IACI,6DAA6D;IAC7D,YAAY;IACZ,YAAY;IACZ,uBAAuB;IACvB,kBAAkB;IAClB,gBAAgB;IAChB,eAAe;IACf,2CAA2C;AAC/C;;AAEA;IACI,2BAA2B;IAC3B,+CAA+C;AACnD;;AAEA;IACI,YAAY;IACZ,mBAAmB;IACnB,eAAe;IACf,gBAAgB;AACpB;;AAEA,qBAAqB;AACrB;IACI,mBAAmB;IACnB,kBAAkB;IAClB,aAAa;IACb,cAAc;IACd,wDAAwD;IACxD,gBAAgB;AACpB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,cAAc;IACd,eAAe;IACf,kBAAkB;IAClB,WAAW;IACX,oBAAoB;AACxB;;AAEA;IACI,mBAAmB;IACnB,aAAa;IACb,kBAAkB;IAClB,qBAAqB;IACrB,qBAAqB;IACrB,iBAAiB;IACjB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA,oCAAoC;AACpC;IACI,sBAAsB;IACtB,gBAAgB;IAChB,yBAAyB;IACzB,cAAc;AAClB;;AAEA;IACI,sBAAsB;IACtB,gBAAgB;IAChB,kBAAkB;IAClB,eAAe;AACnB;;AAEA,yBAAyB;AACzB;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,aAAa;IACb,cAAc;AAClB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,YAAY;IACZ,kBAAkB;IAClB,yCAAyC;IACzC,gBAAgB;AACpB;;AAEA,kEAAkE;AAClE;IACI;QACI,4BAA4B;IAChC;;IAEA;QACI,4BAA4B;QAC5B,4BAA4B;QAC5B,+BAA+B;QAC/B,6BAA6B;QAC7B,kCAAkC;QAClC,4BAA4B;QAC5B,0BAA0B;QAC1B,6BAA6B;QAC7B,4BAA4B;QAC5B,mCAAmC,EAAE,eAAe;QACpD,2BAA2B;QAC3B,oBAAoB;QACpB,2BAA2B;QAC3B,qCAAqC;QACrC,gCAAgC;QAChC,+CAA+C;QAC/C,wBAAwB;QACxB,yBAAyB;QACzB,8BAA8B;IAClC;AACJ;;AAEA;IACI;QACI,wBAAwB;QACxB,4BAA4B;QAC5B,8BAA8B;QAC9B,4BAA4B;QAC5B,gCAAgC;QAChC,6BAA6B;QAC7B,+BAA+B;QAC/B,sDAAsD;QACtD,6BAA6B;QAC7B,qCAAqC;QACrC,gCAAgC;QAChC,wBAAwB;IAC5B;AACJ;;AAEA,0DAA0D;AAC1D;IACI,yBAAyB;IACzB,8BAA8B;IAC9B,qBAAqB;AACzB;;AAEA,2BAA2B;AAC3B;IACI,qBAAqB;IACrB,gCAAgC;IAChC,sBAAsB;AAC1B;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,WAAW;AACf;;AAEA;IACI,yBAAyB;IACzB,qBAAqB;IACrB,mBAAmB;IACnB,cAAc;IACd,iBAAiB;IACjB,gBAAgB;IAChB,gBAAgB;IAChB,2BAA2B;AAC/B;;AAEA;IACI,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,qBAAqB;AACzB;;AAEA,qBAAqB;AACrB;IACI,qBAAqB;IACrB,mDAAmD;AACvD;;AAEA;IACI,UAAU;AACd;;AAEA;IACI,uBAAuB;AAC3B;;AAEA;IACI,kCAAkC;IAClC,kBAAkB;AACtB;;AAEA;IACI,kCAAkC;AACtC;;AAEA,2CAA2C;AAC3C;IACI,kBAAkB;IAClB,YAAY;AAChB;;AAEA;IACI,cAAc;AAClB;;AAEA,8DAA8D;AAC9D;IACI,oBAAoB;IACpB,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;AAC7C;;AAEA;IACI,WAAW;IACX,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,gCAAgC;IAChC,6BAA6B;IAC7B,2BAA2B;IAC3B,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA;;IAEI,UAAU;IACV,mBAAmB;AACvB;;AAEA,+BAA+B;AAC/B;IACI;QACI,UAAU;QACV,WAAW;QACX,kBAAkB;QAClB,YAAY;IAChB;;IAEA;QACI,UAAU;QACV,WAAW;QACX,+BAA+B;QAC/B,+BAA+B;QAC/B,0BAA0B;IAC9B;AACJ;;AAEA,gDAAgD;AAChD;IACI,8BAA8B;IAC9B,oCAAoC;IACpC,6BAA6B;IAC7B,0BAA0B;IAC1B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;AAC/B;;AAEA;IACI,2BAA2B;IAC3B,kFAAkF;IAClF,yBAAyB;AAC7B;;AAEA,gBAAgB;AAChB;IACI,8BAA8B;IAC9B,+BAA+B;IAC/B,6BAA6B;IAC7B,2BAA2B;IAC3B,yBAAyB;AAC7B;;AAEA,iCAAiC;AACjC;IACI,eAAe;IACf,eAAe;IACf,2BAA2B;IAC3B,cAAc;IACd,4BAA4B;IAC5B,0BAA0B;AAC9B;;AAEA;IACI,8BAA8B;IAC9B,eAAe;AACnB;;AAEA,qCAAqC;AACrC;IACI;QACI,uCAAuC;QACvC,eAAe;IACnB;AACJ;;AAEA,kCAAkC;AAClC;IACI,eAAe;IACf,gBAAgB;IAChB,wBAAwB;IACxB,cAAc;AAClB;;AAEA,0BAA0B;AAC1B;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,qCAAqC;IACrC,iCAAiC;IACjC,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,uBAAuB;IACvB,cAAc;IACd,gBAAgB;AACpB;;AAEA,6BAA6B;AAC7B;;IAEI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;AACzB;;AAEA,0DAA0D;AAC1D;;;;;;;;;IASI,kBAAkB;IAClB,cAAc;IACd,gBAAgB;IAChB,0BAA0B;IAC1B,+CAA+C;IAC/C,yBAAyB;AAC7B;;AAEA;;;;;;;;;IASI,cAAc;IACd,8BAA8B;IAC9B,oCAAoC;IACpC,gBAAgB;IAChB,kBAAkB;AACtB;;AAEA,gDAAgD;AAChD;IACI,2BAA2B;IAC3B,kBAAkB;IAClB,YAAY;IACZ,SAAS;IACT,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,gBAAgB;IAChB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;IACzC,kBAAkB;AACtB;;AAEA;IACI,WAAW;IACX,kBAAkB;IAClB,YAAY;IACZ,SAAS;IACT,2BAA2B;IAC3B,6BAA6B;IAC7B,yBAAyB;IACzB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA;;IAEI,UAAU;IACV,mBAAmB;AACvB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,gBAAgB;IAChB,oBAAoB;IACpB,cAAc;IACd,8BAA8B;IAC9B,4DAA4D;IAC5D,0BAA0B;IAC1B,kBAAkB;IAClB,cAAc;AAClB;;AAEA,2DAA2D;AAC3D;;IAEI,6DAA6D;IAC7D,cAAc;IACd,qBAAqB;IACrB,qBAAqB;IACrB,mBAAmB;IACnB,yBAAyB;IACzB,qBAAqB;IACrB,yBAAyB;IACzB,gBAAgB;IAChB,8CAA8C;AAClD;;AAEA;;IAEI,6DAA6D;IAC7D,YAAY;IACZ,qBAAqB;IACrB,2BAA2B;IAC3B,8CAA8C;AAClD;;AAEA;;IAEI,wBAAwB;IACxB,6CAA6C;AACjD;;AAEA,wBAAwB;AACxB;;;IAGI,eAAe;IACf,WAAW;IACX,cAAc;IACd,eAAe;AACnB;;AAEA,mCAAmC;AACnC;IACI;;QAEI,cAAc;QACd,iBAAiB;QACjB,kBAAkB;IACtB;AACJ;;AAEA;IACI;QACI,aAAa;IACjB;;IAEA;QACI,aAAa;IACjB;AACJ","sourcesContent":["/* Transformers-specific styling additions */\n\n/* Code comparison layout */\n.code-compare {\n display: grid;\n grid-template-columns: 1fr 1fr;\n gap: 1.5rem;\n margin: 2rem 0;\n align-items: start;\n}\n\n.code-compare .code-column {\n background: #ffffff;\n border: 1px solid #e2e8f0;\n border-radius: 8px;\n overflow: hidden;\n box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);\n}\n\n.code-compare .code-header {\n background: #f8f9fa;\n padding: 0.75rem 1rem;\n font-weight: 600;\n color: #495057;\n border-bottom: 1px solid #e2e8f0;\n font-size: 0.9em;\n}\n\n.code-compare pre {\n margin: 0;\n padding: 1rem;\n background: #ffffff;\n overflow-x: auto;\n font-size: 0.85em;\n line-height: 1.4;\n}\n\n.code-compare pre code {\n color: #374151;\n}\n\n/* Mobile responsiveness for code comparison */\n@media (max-width: 768px) {\n .code-compare {\n grid-template-columns: 1fr;\n gap: 1rem;\n }\n}\n\n/* Tenet styling - special highlighting for design principles */\n.tenet-list {\n margin: 3rem 0;\n}\n\n.tenet-list ol {\n counter-reset: tenet-counter -1; /* Start from 0 */\n list-style: none;\n padding-left: 0;\n display: grid;\n grid-template-columns: 1fr;\n gap: 2.5rem;\n max-width: 900px;\n margin: 0 auto;\n}\n\n.tenet-list li.tenet {\n counter-increment: tenet-counter;\n background: linear-gradient(135deg, #ffffff 0%, #f8f9fa 100%);\n border: 2px solid #e2e8f0;\n border-radius: 16px;\n padding: 2rem 2rem 2rem 4rem;\n margin: 0;\n position: relative;\n box-shadow: 0 12px 35px rgba(0, 0, 0, 0.12);\n transition: all 0.3s ease;\n cursor: pointer;\n}\n\n.tenet-list li.tenet:hover {\n transform: translateY(-8px) scale(1.02);\n box-shadow: 0 20px 50px rgba(0, 0, 0, 0.25);\n border-color: rgba(0, 123, 255, 0.5);\n background: linear-gradient(135deg, #ffffff 0%, #f0f8ff 100%);\n}\n\n/* Colorful numbering system */\n.tenet-list li.tenet:nth-child(1):before { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); }\n.tenet-list li.tenet:nth-child(2):before { background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); }\n.tenet-list li.tenet:nth-child(3):before { background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); }\n.tenet-list li.tenet:nth-child(4):before { background: linear-gradient(135deg, #43e97b 0%, #38f9d7 100%); }\n.tenet-list li.tenet:nth-child(5):before { background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); }\n.tenet-list li.tenet:nth-child(6):before { background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); }\n.tenet-list li.tenet:nth-child(7):before { background: linear-gradient(135deg, #ff9a9e 0%, #fecfef 100%); }\n.tenet-list li.tenet:nth-child(8):before { background: linear-gradient(135deg, #a18cd1 0%, #fbc2eb 100%); }\n.tenet-list li.tenet:nth-child(9):before { background: linear-gradient(135deg, #ffecd2 0%, #fcb69f 100%); }\n\n.tenet-list li.tenet:before {\n content: counter(tenet-counter);\n position: absolute;\n top: -12px;\n left: -12px;\n color: white;\n width: 48px;\n height: 48px;\n border-radius: 50%;\n display: flex;\n align-items: center;\n justify-content: center;\n font-size: 1.2em;\n font-weight: bold;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);\n border: 3px solid white;\n}\n\n.tenet-list li.tenet strong {\n color: #1a202c;\n font-size: 1.1em;\n display: block;\n margin-bottom: 0.5rem;\n}\n\n.tenet-list li.tenet em {\n color: #4a5568;\n font-size: 0.95em;\n font-style: italic;\n display: block;\n margin-top: 0.75rem;\n padding: 1rem;\n background: rgba(0, 0, 0, 0.03);\n border-radius: 8px;\n border-left: 3px solid #e2e8f0;\n}\n\n.tenet-list li.tenet p {\n color: #2d3748;\n line-height: 1.6;\n margin: 0.5rem 0;\n}\n\n/* Add a subtle pulse animation for the numbers */\n@keyframes pulse-glow {\n 0% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n 50% { box-shadow: 0 4px 20px rgba(0, 0, 0, 0.25); }\n 100% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n}\n\n.tenet-list li.tenet:hover:before {\n animation: pulse-glow 2s ease-in-out infinite;\n}\n\n/* Interactive component styling */\n.interactive-demo {\n border: 1px solid #e2e8f0;\n border-radius: 12px;\n background: #ffffff;\n margin: 2rem 0;\n overflow: hidden;\n box-shadow: 0 4px 6px rgba(0, 0, 0, 0.07);\n}\n\n/* Model visualization fragment styling */\n[id*=\"plot-model-visualisation\"] {\n margin: 1rem -2rem !important;\n width: calc(100% + 4rem) !important;\n}\n\n.interactive-demo .demo-header {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n color: white;\n padding: 1rem 1.5rem;\n font-weight: 600;\n}\n\n.interactive-demo .demo-content {\n padding: 1.5rem;\n}\n\n.interactive-demo .demo-footer {\n background: #f8f9fa;\n padding: 1rem 1.5rem;\n border-top: 1px solid #e2e8f0;\n color: #6c757d;\n font-size: 0.9em;\n}\n\n/* Button styling for interactive elements */\n.btn-primary {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n border: none;\n color: white;\n padding: 0.75rem 1.5rem;\n border-radius: 6px;\n font-weight: 500;\n cursor: pointer;\n transition: transform 0.2s, box-shadow 0.2s;\n}\n\n.btn-primary:hover {\n transform: translateY(-1px);\n box-shadow: 0 4px 12px rgba(102, 126, 234, 0.3);\n}\n\n.btn-primary:disabled {\n opacity: 0.6;\n cursor: not-allowed;\n transform: none;\n box-shadow: none;\n}\n\n/* Terminal styling */\n.terminal-container {\n background: #1a202c;\n border-radius: 8px;\n padding: 1rem;\n color: #e2e8f0;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;\n font-size: 0.9em;\n}\n\n.terminal-input {\n background: #2d3748;\n border: 1px solid #4a5568;\n color: #e2e8f0;\n padding: 0.5rem;\n border-radius: 4px;\n width: 100%;\n font-family: inherit;\n}\n\n.terminal-output {\n background: #0a0e1a;\n padding: 1rem;\n border-radius: 4px;\n white-space: pre-wrap;\n word-break: break-all;\n min-height: 100px;\n max-height: 300px;\n overflow-y: auto;\n}\n\n/* Attention visualization styling */\n.attention-matrix {\n font-family: monospace;\n font-size: 0.8em;\n border-collapse: collapse;\n margin: 1rem 0;\n}\n\n.attention-matrix td {\n border: 1px solid #ddd;\n padding: 4px 8px;\n text-align: center;\n min-width: 50px;\n}\n\n/* Memory chart styling */\n.memory-chart-container {\n background: #f8f9fa;\n border: 2px solid #e9ecef;\n border-radius: 8px;\n padding: 1rem;\n margin: 1rem 0;\n}\n\n/* Image styling improvements */\nimg {\n max-width: 100%;\n height: auto;\n border-radius: 8px;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);\n margin: 1.5rem 0;\n}\n\n/* Table of contents styling - Fixed positioning like ultrascale */\n@media (min-width: 1200px) {\n d-article {\n overflow: visible !important;\n }\n \n d-contents {\n align-self: start !important;\n background: white !important;\n grid-column-start: 1 !important;\n grid-column-end: 4 !important;\n grid-row: auto / span 6 !important;\n justify-self: end !important;\n margin-top: 0em !important;\n padding-right: 3em !important;\n padding-left: 2em !important;\n position: -webkit-sticky !important; /* For Safari */\n position: sticky !important;\n top: 10px !important;\n overflow-y: auto !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n transition: max-height 0.3s ease-out !important;\n z-index: -100 !important;\n display: block !important;\n visibility: visible !important;\n }\n}\n\n@media (max-width: 1199px) {\n d-contents {\n display: none !important;\n background: white !important;\n justify-self: start !important;\n align-self: start !important;\n padding-bottom: 0.5em !important;\n margin-bottom: 1em !important;\n padding-left: 0.25em !important;\n border-bottom: 1px solid rgba(0, 0, 0, 0.1) !important;\n overflow-y: scroll !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n z-index: -100 !important;\n }\n}\n\n/* Force TOC to be visible and override distill defaults */\nd-contents {\n display: block !important;\n visibility: visible !important;\n opacity: 1 !important;\n}\n\n/* TOC Navigation styling */\nd-contents .toc-header {\n margin-bottom: 1.5rem;\n border-bottom: 2px solid #007bff;\n padding-bottom: 0.5rem;\n}\n\nd-contents .toc-title {\n font-weight: bold;\n font-size: 1.2em;\n color: #333;\n}\n\nd-contents nav a {\n color: rgba(0, 0, 0, 0.7);\n text-decoration: none;\n border-bottom: none;\n display: block;\n padding: 0.3rem 0;\n font-size: 0.9em;\n line-height: 1.4;\n transition: color 0.2s ease;\n}\n\nd-contents nav a:hover {\n color: #007bff;\n text-decoration: none;\n}\n\nd-contents nav a.active {\n color: #007bff;\n font-weight: 600;\n}\n\nd-contents nav div {\n margin-bottom: 0.2rem;\n}\n\n/* Smooth scrollbar */\nd-contents {\n scrollbar-width: thin;\n scrollbar-color: rgba(0, 123, 255, 0.3) transparent;\n}\n\nd-contents::-webkit-scrollbar {\n width: 6px;\n}\n\nd-contents::-webkit-scrollbar-track {\n background: transparent;\n}\n\nd-contents::-webkit-scrollbar-thumb {\n background: rgba(0, 123, 255, 0.3);\n border-radius: 3px;\n}\n\nd-contents::-webkit-scrollbar-thumb:hover {\n background: rgba(0, 123, 255, 0.5);\n}\n\n/* Custom tooltip styling for tenet links */\nd-contents nav a[title] {\n position: relative;\n cursor: help;\n}\n\nd-contents nav a[title]:hover {\n color: #667eea;\n}\n\n/* Enhanced tooltip using CSS (fallback for title attribute) */\nd-contents nav a[title]:after {\n content: attr(title);\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translateY(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n white-space: normal;\n width: 300px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n}\n\nd-contents nav a[title]:before {\n content: '';\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translate(-8px, -50%);\n border: 8px solid transparent;\n border-right-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\nd-contents nav a[title]:hover:after,\nd-contents nav a[title]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Adjust for smaller screens */\n@media (max-width: 1400px) {\n d-contents nav a[title]:after {\n left: auto;\n right: 100%;\n margin-right: 1rem;\n width: 250px;\n }\n \n d-contents nav a[title]:before {\n left: auto;\n right: 100%;\n transform: translate(8px, -50%);\n border-right-color: transparent;\n border-left-color: #1a202c;\n }\n}\n\n/* Improve code syntax highlighting with Prism */\npre[class*=\"language-\"] {\n background: #f8f9fa !important;\n border: 1px solid #e9ecef !important;\n border-radius: 8px !important;\n padding: 1.5rem !important;\n margin: 1.5rem 0 !important;\n overflow-x: auto !important;\n font-size: 0.9em !important;\n line-height: 1.5 !important;\n}\n\ncode[class*=\"language-\"] {\n background: none !important;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', 'Courier New', monospace !important;\n color: #383a42 !important;\n}\n\n/* Inline code */\np code, li code {\n background: #f1f3f4 !important;\n padding: 0.2em 0.4em !important;\n border-radius: 3px !important;\n font-size: 0.9em !important;\n color: #d73a49 !important;\n}\n\n/* Distill article improvements */\nd-article {\n max-width: none;\n font-size: 19px;\n line-height: 1.7 !important;\n color: #1a1a1a;\n padding-top: 1rem !important;\n grid-row-gap: 0 !important;\n}\n\nd-article > * {\n grid-column: middle !important;\n max-width: none;\n}\n\n/* Adjust for TOC on larger screens */\n@media (min-width: 1200px) {\n d-article > * {\n grid-column: text / page-end !important;\n max-width: none;\n }\n}\n\n/* Improve paragraph readability */\nd-article p {\n font-size: 19px;\n line-height: 1.5;\n margin-top: 0 !important;\n color: #1a1a1a;\n}\n\n/* Improve heading sizes */\nd-article h1 {\n font-size: 3rem;\n line-height: 1.2;\n margin: 3rem 0 2rem 0;\n color: #1a202c;\n font-weight: 700;\n}\n\nd-article h2 {\n font-size: 2.5rem;\n line-height: 1.3;\n margin: 1.5rem 0 0.75rem 0 !important;\n padding-bottom: 0.5rem !important;\n color: #1a202c;\n font-weight: 650;\n}\n\nd-article h3 {\n font-size: 2rem;\n line-height: 1.4;\n margin: 2rem 0 1rem 0;\n color: #1a202c;\n font-weight: 600;\n}\n\nd-article h4 {\n font-size: 1.5rem;\n line-height: 1.4;\n margin: 1.5rem 0 1rem 0;\n color: #2d3748;\n font-weight: 600;\n}\n\n/* Improve list readability */\nd-article ul li,\nd-article ol li {\n font-size: 18px;\n line-height: 1.7;\n margin-bottom: 0.5rem;\n}\n\n/* Enhanced tenet reference styling with custom tooltips */\na[href^=\"#source-of-truth\"],\na[href^=\"#one-model-one-file\"],\na[href^=\"#code-is-product\"],\na[href^=\"#standardize-dont-abstract\"],\na[href^=\"#do-repeat-yourself\"],\na[href^=\"#minimal-user-api\"],\na[href^=\"#backwards-compatibility\"],\na[href^=\"#consistent-public-surface\"],\na[href^=\"#modular-toolbox\"] {\n position: relative;\n color: #667eea;\n font-weight: 600;\n text-decoration: underline;\n text-decoration-color: rgba(102, 126, 234, 0.3);\n transition: all 0.3s ease;\n}\n\na[href^=\"#source-of-truth\"]:hover,\na[href^=\"#one-model-one-file\"]:hover,\na[href^=\"#code-is-product\"]:hover,\na[href^=\"#standardize-dont-abstract\"]:hover,\na[href^=\"#do-repeat-yourself\"]:hover,\na[href^=\"#minimal-user-api\"]:hover,\na[href^=\"#backwards-compatibility\"]:hover,\na[href^=\"#consistent-public-surface\"]:hover,\na[href^=\"#modular-toolbox\"]:hover {\n color: #4c51bf;\n text-decoration-color: #4c51bf;\n background: rgba(102, 126, 234, 0.1);\n padding: 2px 4px;\n border-radius: 4px;\n}\n\n/* Custom tooltip using data-tooltip attribute */\na[data-tooltip]:after {\n content: attr(data-tooltip);\n position: absolute;\n bottom: 100%;\n left: 50%;\n transform: translateX(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n font-weight: 400;\n white-space: normal;\n width: 320px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n margin-bottom: 8px;\n}\n\na[data-tooltip]:before {\n content: '';\n position: absolute;\n bottom: 100%;\n left: 50%;\n transform: translateX(-50%);\n border: 8px solid transparent;\n border-top-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\na[data-tooltip]:hover:after,\na[data-tooltip]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Improve blockquote styling */\nd-article blockquote {\n font-size: 19px;\n line-height: 1.8;\n padding: 1.5rem 2rem;\n margin: 2rem 0;\n border-left: 4px solid #667eea;\n background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 50%);\n border-radius: 0 8px 8px 0;\n font-style: italic;\n color: #4a5568;\n}\n\n/* Link capsule styling - only for external HTTP(S) links */\nd-article a[href^=\"http://\"],\nd-article a[href^=\"https://\"] {\n background: linear-gradient(135deg, #e3f2fd 0%, #bbdefb 100%);\n color: #1565c0;\n text-decoration: none;\n padding: 0.15em 0.5em;\n border-radius: 12px;\n border: 1px solid #90caf9;\n display: inline-block;\n transition: all 0.3s ease;\n font-weight: 500;\n box-shadow: 0 1px 3px rgba(21, 101, 192, 0.15);\n}\n\nd-article a[href^=\"http://\"]:hover,\nd-article a[href^=\"https://\"]:hover {\n background: linear-gradient(135deg, #2196f3 0%, #1976d2 100%);\n color: white;\n border-color: #1565c0;\n transform: translateY(-1px);\n box-shadow: 0 4px 12px rgba(21, 101, 192, 0.3);\n}\n\nd-article a[href^=\"http://\"]:active,\nd-article a[href^=\"https://\"]:active {\n transform: translateY(0);\n box-shadow: 0 1px 3px rgba(21, 101, 192, 0.2);\n}\n\n/* Full width elements */\nd-article .code-compare,\nd-article .interactive-demo,\nd-article .memory-chart-container {\n max-width: none;\n width: 100%;\n margin-left: 0;\n margin-right: 0;\n}\n\n/* Responsive design improvements */\n@media (max-width: 1200px) {\n d-article .code-compare,\n d-article .interactive-demo {\n max-width: 95%;\n margin-left: auto;\n margin-right: auto;\n }\n}\n\n@media (max-width: 768px) {\n .tenet-list li.tenet {\n padding: 1rem;\n }\n\n .interactive-demo .demo-content {\n padding: 1rem;\n }\n}\n\n"],"sourceRoot":""}]);
1857
  // Exports
1858
  /* harmony default export */ const __WEBPACK_DEFAULT_EXPORT__ = (___CSS_LOADER_EXPORT___);
1859
 
 
1780
  visibility: visible;
1781
  }
1782
 
1783
+ /* Breadcrumb navigation styling */
1784
+ .crumbs {
1785
+ background: linear-gradient(135deg, #f0f4ff 0%, #e6eeff 100%);
1786
+ border-left: 5px solid #667eea;
1787
+ padding: 1.25rem 1.75rem;
1788
+ margin: 2.5rem 0;
1789
+ border-radius: 0 8px 8px 0;
1790
+ box-shadow: 0 2px 8px rgba(102, 126, 234, 0.12);
1791
+ font-size: 0.95em;
1792
+ line-height: 1.6;
1793
+ color: #4a5568;
1794
+ }
1795
+
1796
+ .crumbs strong {
1797
+ color: #667eea;
1798
+ font-weight: 700;
1799
+ }
1800
+
1801
+ .crumbs code {
1802
+ background: rgba(102, 126, 234, 0.1);
1803
+ padding: 0.15em 0.4em;
1804
+ border-radius: 3px;
1805
+ font-size: 0.9em;
1806
+ color: #4c51bf;
1807
+ }
1808
+
1809
+ .crumbs a {
1810
+ color: #667eea;
1811
+ font-weight: 500;
1812
+ }
1813
+
1814
  /* Improve blockquote styling */
1815
  d-article blockquote {
1816
  font-size: 19px;
 
1884
  }
1885
  }
1886
 
1887
+ `, "",{"version":3,"sources":["webpack://./src/transformers-custom.css"],"names":[],"mappings":"AAAA,4CAA4C;;AAE5C,2BAA2B;AAC3B;IACI,aAAa;IACb,8BAA8B;IAC9B,WAAW;IACX,cAAc;IACd,kBAAkB;AACtB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,gBAAgB;IAChB,wCAAwC;AAC5C;;AAEA;IACI,mBAAmB;IACnB,qBAAqB;IACrB,gBAAgB;IAChB,cAAc;IACd,gCAAgC;IAChC,gBAAgB;AACpB;;AAEA;IACI,SAAS;IACT,aAAa;IACb,mBAAmB;IACnB,gBAAgB;IAChB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA;IACI,cAAc;AAClB;;AAEA,8CAA8C;AAC9C;IACI;QACI,0BAA0B;QAC1B,SAAS;IACb;AACJ;;AAEA,+DAA+D;AAC/D;IACI,cAAc;AAClB;;AAEA;IACI,+BAA+B,EAAE,iBAAiB;IAClD,gBAAgB;IAChB,eAAe;IACf,aAAa;IACb,0BAA0B;IAC1B,WAAW;IACX,gBAAgB;IAChB,cAAc;AAClB;;AAEA;IACI,gCAAgC;IAChC,6DAA6D;IAC7D,yBAAyB;IACzB,mBAAmB;IACnB,4BAA4B;IAC5B,SAAS;IACT,kBAAkB;IAClB,2CAA2C;IAC3C,yBAAyB;IACzB,eAAe;AACnB;;AAEA;IACI,uCAAuC;IACvC,2CAA2C;IAC3C,oCAAoC;IACpC,6DAA6D;AACjE;;AAEA,8BAA8B;AAC9B,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;AAC1G,2CAA2C,6DAA6D,EAAE;;AAE1G;IACI,+BAA+B;IAC/B,kBAAkB;IAClB,UAAU;IACV,WAAW;IACX,YAAY;IACZ,WAAW;IACX,YAAY;IACZ,kBAAkB;IAClB,aAAa;IACb,mBAAmB;IACnB,uBAAuB;IACvB,gBAAgB;IAChB,iBAAiB;IACjB,0CAA0C;IAC1C,uBAAuB;AAC3B;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,iBAAiB;IACjB,kBAAkB;IAClB,cAAc;IACd,mBAAmB;IACnB,aAAa;IACb,+BAA+B;IAC/B,kBAAkB;IAClB,8BAA8B;AAClC;;AAEA;IACI,cAAc;IACd,gBAAgB;IAChB,gBAAgB;AACpB;;AAEA,iDAAiD;AACjD;IACI,KAAK,0CAA0C,EAAE;IACjD,MAAM,0CAA0C,EAAE;IAClD,OAAO,0CAA0C,EAAE;AACvD;;AAEA;IACI,6CAA6C;AACjD;;AAEA,kCAAkC;AAClC;IACI,yBAAyB;IACzB,mBAAmB;IACnB,mBAAmB;IACnB,cAAc;IACd,gBAAgB;IAChB,yCAAyC;AAC7C;;AAEA,yCAAyC;AACzC;IACI,6BAA6B;IAC7B,mCAAmC;AACvC;;AAEA;IACI,6DAA6D;IAC7D,YAAY;IACZ,oBAAoB;IACpB,gBAAgB;AACpB;;AAEA;IACI,eAAe;AACnB;;AAEA;IACI,mBAAmB;IACnB,oBAAoB;IACpB,6BAA6B;IAC7B,cAAc;IACd,gBAAgB;AACpB;;AAEA,4CAA4C;AAC5C;IACI,6DAA6D;IAC7D,YAAY;IACZ,YAAY;IACZ,uBAAuB;IACvB,kBAAkB;IAClB,gBAAgB;IAChB,eAAe;IACf,2CAA2C;AAC/C;;AAEA;IACI,2BAA2B;IAC3B,+CAA+C;AACnD;;AAEA;IACI,YAAY;IACZ,mBAAmB;IACnB,eAAe;IACf,gBAAgB;AACpB;;AAEA,qBAAqB;AACrB;IACI,mBAAmB;IACnB,kBAAkB;IAClB,aAAa;IACb,cAAc;IACd,wDAAwD;IACxD,gBAAgB;AACpB;;AAEA;IACI,mBAAmB;IACnB,yBAAyB;IACzB,cAAc;IACd,eAAe;IACf,kBAAkB;IAClB,WAAW;IACX,oBAAoB;AACxB;;AAEA;IACI,mBAAmB;IACnB,aAAa;IACb,kBAAkB;IAClB,qBAAqB;IACrB,qBAAqB;IACrB,iBAAiB;IACjB,iBAAiB;IACjB,gBAAgB;AACpB;;AAEA,oCAAoC;AACpC;IACI,sBAAsB;IACtB,gBAAgB;IAChB,yBAAyB;IACzB,cAAc;AAClB;;AAEA;IACI,sBAAsB;IACtB,gBAAgB;IAChB,kBAAkB;IAClB,eAAe;AACnB;;AAEA,yBAAyB;AACzB;IACI,mBAAmB;IACnB,yBAAyB;IACzB,kBAAkB;IAClB,aAAa;IACb,cAAc;AAClB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,YAAY;IACZ,kBAAkB;IAClB,yCAAyC;IACzC,gBAAgB;AACpB;;AAEA,kEAAkE;AAClE;IACI;QACI,4BAA4B;IAChC;;IAEA;QACI,4BAA4B;QAC5B,4BAA4B;QAC5B,+BAA+B;QAC/B,6BAA6B;QAC7B,kCAAkC;QAClC,4BAA4B;QAC5B,0BAA0B;QAC1B,6BAA6B;QAC7B,4BAA4B;QAC5B,mCAAmC,EAAE,eAAe;QACpD,2BAA2B;QAC3B,oBAAoB;QACpB,2BAA2B;QAC3B,qCAAqC;QACrC,gCAAgC;QAChC,+CAA+C;QAC/C,wBAAwB;QACxB,yBAAyB;QACzB,8BAA8B;IAClC;AACJ;;AAEA;IACI;QACI,wBAAwB;QACxB,4BAA4B;QAC5B,8BAA8B;QAC9B,4BAA4B;QAC5B,gCAAgC;QAChC,6BAA6B;QAC7B,+BAA+B;QAC/B,sDAAsD;QACtD,6BAA6B;QAC7B,qCAAqC;QACrC,gCAAgC;QAChC,wBAAwB;IAC5B;AACJ;;AAEA,0DAA0D;AAC1D;IACI,yBAAyB;IACzB,8BAA8B;IAC9B,qBAAqB;AACzB;;AAEA,2BAA2B;AAC3B;IACI,qBAAqB;IACrB,gCAAgC;IAChC,sBAAsB;AAC1B;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,WAAW;AACf;;AAEA;IACI,yBAAyB;IACzB,qBAAqB;IACrB,mBAAmB;IACnB,cAAc;IACd,iBAAiB;IACjB,gBAAgB;IAChB,gBAAgB;IAChB,2BAA2B;AAC/B;;AAEA;IACI,cAAc;IACd,qBAAqB;AACzB;;AAEA;IACI,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,qBAAqB;AACzB;;AAEA,qBAAqB;AACrB;IACI,qBAAqB;IACrB,mDAAmD;AACvD;;AAEA;IACI,UAAU;AACd;;AAEA;IACI,uBAAuB;AAC3B;;AAEA;IACI,kCAAkC;IAClC,kBAAkB;AACtB;;AAEA;IACI,kCAAkC;AACtC;;AAEA,2CAA2C;AAC3C;IACI,kBAAkB;IAClB,YAAY;AAChB;;AAEA;IACI,cAAc;AAClB;;AAEA,8DAA8D;AAC9D;IACI,oBAAoB;IACpB,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;AAC7C;;AAEA;IACI,WAAW;IACX,kBAAkB;IAClB,UAAU;IACV,QAAQ;IACR,gCAAgC;IAChC,6BAA6B;IAC7B,2BAA2B;IAC3B,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA;;IAEI,UAAU;IACV,mBAAmB;AACvB;;AAEA,+BAA+B;AAC/B;IACI;QACI,UAAU;QACV,WAAW;QACX,kBAAkB;QAClB,YAAY;IAChB;;IAEA;QACI,UAAU;QACV,WAAW;QACX,+BAA+B;QAC/B,+BAA+B;QAC/B,0BAA0B;IAC9B;AACJ;;AAEA,gDAAgD;AAChD;IACI,8BAA8B;IAC9B,oCAAoC;IACpC,6BAA6B;IAC7B,0BAA0B;IAC1B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;IAC3B,2BAA2B;AAC/B;;AAEA;IACI,2BAA2B;IAC3B,kFAAkF;IAClF,yBAAyB;AAC7B;;AAEA,gBAAgB;AAChB;IACI,8BAA8B;IAC9B,+BAA+B;IAC/B,6BAA6B;IAC7B,2BAA2B;IAC3B,yBAAyB;AAC7B;;AAEA,iCAAiC;AACjC;IACI,eAAe;IACf,eAAe;IACf,2BAA2B;IAC3B,cAAc;IACd,4BAA4B;IAC5B,0BAA0B;AAC9B;;AAEA;IACI,8BAA8B;IAC9B,eAAe;AACnB;;AAEA,qCAAqC;AACrC;IACI;QACI,uCAAuC;QACvC,eAAe;IACnB;AACJ;;AAEA,kCAAkC;AAClC;IACI,eAAe;IACf,gBAAgB;IAChB,wBAAwB;IACxB,cAAc;AAClB;;AAEA,0BAA0B;AAC1B;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,qCAAqC;IACrC,iCAAiC;IACjC,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;IACrB,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,iBAAiB;IACjB,gBAAgB;IAChB,uBAAuB;IACvB,cAAc;IACd,gBAAgB;AACpB;;AAEA,6BAA6B;AAC7B;;IAEI,eAAe;IACf,gBAAgB;IAChB,qBAAqB;AACzB;;AAEA,0DAA0D;AAC1D;;;;;;;;;IASI,kBAAkB;IAClB,cAAc;IACd,gBAAgB;IAChB,0BAA0B;IAC1B,+CAA+C;IAC/C,yBAAyB;AAC7B;;AAEA;;;;;;;;;IASI,cAAc;IACd,8BAA8B;IAC9B,oCAAoC;IACpC,gBAAgB;IAChB,kBAAkB;AACtB;;AAEA,gDAAgD;AAChD;IACI,2BAA2B;IAC3B,kBAAkB;IAClB,YAAY;IACZ,SAAS;IACT,2BAA2B;IAC3B,mBAAmB;IACnB,YAAY;IACZ,qBAAqB;IACrB,kBAAkB;IAClB,iBAAiB;IACjB,gBAAgB;IAChB,mBAAmB;IACnB,YAAY;IACZ,gBAAgB;IAChB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;IACnD,oBAAoB;IACpB,yCAAyC;IACzC,kBAAkB;AACtB;;AAEA;IACI,WAAW;IACX,kBAAkB;IAClB,YAAY;IACZ,SAAS;IACT,2BAA2B;IAC3B,6BAA6B;IAC7B,yBAAyB;IACzB,aAAa;IACb,UAAU;IACV,kBAAkB;IAClB,mDAAmD;AACvD;;AAEA;;IAEI,UAAU;IACV,mBAAmB;AACvB;;AAEA,kCAAkC;AAClC;IACI,6DAA6D;IAC7D,8BAA8B;IAC9B,wBAAwB;IACxB,gBAAgB;IAChB,0BAA0B;IAC1B,+CAA+C;IAC/C,iBAAiB;IACjB,gBAAgB;IAChB,cAAc;AAClB;;AAEA;IACI,cAAc;IACd,gBAAgB;AACpB;;AAEA;IACI,oCAAoC;IACpC,qBAAqB;IACrB,kBAAkB;IAClB,gBAAgB;IAChB,cAAc;AAClB;;AAEA;IACI,cAAc;IACd,gBAAgB;AACpB;;AAEA,+BAA+B;AAC/B;IACI,eAAe;IACf,gBAAgB;IAChB,oBAAoB;IACpB,cAAc;IACd,8BAA8B;IAC9B,4DAA4D;IAC5D,0BAA0B;IAC1B,kBAAkB;IAClB,cAAc;AAClB;;AAEA,2DAA2D;AAC3D;;IAEI,6DAA6D;IAC7D,cAAc;IACd,qBAAqB;IACrB,qBAAqB;IACrB,mBAAmB;IACnB,yBAAyB;IACzB,qBAAqB;IACrB,yBAAyB;IACzB,gBAAgB;IAChB,8CAA8C;AAClD;;AAEA;;IAEI,6DAA6D;IAC7D,YAAY;IACZ,qBAAqB;IACrB,2BAA2B;IAC3B,8CAA8C;AAClD;;AAEA;;IAEI,wBAAwB;IACxB,6CAA6C;AACjD;;AAEA,wBAAwB;AACxB;;;IAGI,eAAe;IACf,WAAW;IACX,cAAc;IACd,eAAe;AACnB;;AAEA,mCAAmC;AACnC;IACI;;QAEI,cAAc;QACd,iBAAiB;QACjB,kBAAkB;IACtB;AACJ;;AAEA;IACI;QACI,aAAa;IACjB;;IAEA;QACI,aAAa;IACjB;AACJ","sourcesContent":["/* Transformers-specific styling additions */\n\n/* Code comparison layout */\n.code-compare {\n display: grid;\n grid-template-columns: 1fr 1fr;\n gap: 1.5rem;\n margin: 2rem 0;\n align-items: start;\n}\n\n.code-compare .code-column {\n background: #ffffff;\n border: 1px solid #e2e8f0;\n border-radius: 8px;\n overflow: hidden;\n box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);\n}\n\n.code-compare .code-header {\n background: #f8f9fa;\n padding: 0.75rem 1rem;\n font-weight: 600;\n color: #495057;\n border-bottom: 1px solid #e2e8f0;\n font-size: 0.9em;\n}\n\n.code-compare pre {\n margin: 0;\n padding: 1rem;\n background: #ffffff;\n overflow-x: auto;\n font-size: 0.85em;\n line-height: 1.4;\n}\n\n.code-compare pre code {\n color: #374151;\n}\n\n/* Mobile responsiveness for code comparison */\n@media (max-width: 768px) {\n .code-compare {\n grid-template-columns: 1fr;\n gap: 1rem;\n }\n}\n\n/* Tenet styling - special highlighting for design principles */\n.tenet-list {\n margin: 3rem 0;\n}\n\n.tenet-list ol {\n counter-reset: tenet-counter -1; /* Start from 0 */\n list-style: none;\n padding-left: 0;\n display: grid;\n grid-template-columns: 1fr;\n gap: 2.5rem;\n max-width: 900px;\n margin: 0 auto;\n}\n\n.tenet-list li.tenet {\n counter-increment: tenet-counter;\n background: linear-gradient(135deg, #ffffff 0%, #f8f9fa 100%);\n border: 2px solid #e2e8f0;\n border-radius: 16px;\n padding: 2rem 2rem 2rem 4rem;\n margin: 0;\n position: relative;\n box-shadow: 0 12px 35px rgba(0, 0, 0, 0.12);\n transition: all 0.3s ease;\n cursor: pointer;\n}\n\n.tenet-list li.tenet:hover {\n transform: translateY(-8px) scale(1.02);\n box-shadow: 0 20px 50px rgba(0, 0, 0, 0.25);\n border-color: rgba(0, 123, 255, 0.5);\n background: linear-gradient(135deg, #ffffff 0%, #f0f8ff 100%);\n}\n\n/* Colorful numbering system */\n.tenet-list li.tenet:nth-child(1):before { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); }\n.tenet-list li.tenet:nth-child(2):before { background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); }\n.tenet-list li.tenet:nth-child(3):before { background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); }\n.tenet-list li.tenet:nth-child(4):before { background: linear-gradient(135deg, #43e97b 0%, #38f9d7 100%); }\n.tenet-list li.tenet:nth-child(5):before { background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); }\n.tenet-list li.tenet:nth-child(6):before { background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); }\n.tenet-list li.tenet:nth-child(7):before { background: linear-gradient(135deg, #ff9a9e 0%, #fecfef 100%); }\n.tenet-list li.tenet:nth-child(8):before { background: linear-gradient(135deg, #a18cd1 0%, #fbc2eb 100%); }\n.tenet-list li.tenet:nth-child(9):before { background: linear-gradient(135deg, #ffecd2 0%, #fcb69f 100%); }\n\n.tenet-list li.tenet:before {\n content: counter(tenet-counter);\n position: absolute;\n top: -12px;\n left: -12px;\n color: white;\n width: 48px;\n height: 48px;\n border-radius: 50%;\n display: flex;\n align-items: center;\n justify-content: center;\n font-size: 1.2em;\n font-weight: bold;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);\n border: 3px solid white;\n}\n\n.tenet-list li.tenet strong {\n color: #1a202c;\n font-size: 1.1em;\n display: block;\n margin-bottom: 0.5rem;\n}\n\n.tenet-list li.tenet em {\n color: #4a5568;\n font-size: 0.95em;\n font-style: italic;\n display: block;\n margin-top: 0.75rem;\n padding: 1rem;\n background: rgba(0, 0, 0, 0.03);\n border-radius: 8px;\n border-left: 3px solid #e2e8f0;\n}\n\n.tenet-list li.tenet p {\n color: #2d3748;\n line-height: 1.6;\n margin: 0.5rem 0;\n}\n\n/* Add a subtle pulse animation for the numbers */\n@keyframes pulse-glow {\n 0% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n 50% { box-shadow: 0 4px 20px rgba(0, 0, 0, 0.25); }\n 100% { box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); }\n}\n\n.tenet-list li.tenet:hover:before {\n animation: pulse-glow 2s ease-in-out infinite;\n}\n\n/* Interactive component styling */\n.interactive-demo {\n border: 1px solid #e2e8f0;\n border-radius: 12px;\n background: #ffffff;\n margin: 2rem 0;\n overflow: hidden;\n box-shadow: 0 4px 6px rgba(0, 0, 0, 0.07);\n}\n\n/* Model visualization fragment styling */\n[id*=\"plot-model-visualisation\"] {\n margin: 1rem -2rem !important;\n width: calc(100% + 4rem) !important;\n}\n\n.interactive-demo .demo-header {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n color: white;\n padding: 1rem 1.5rem;\n font-weight: 600;\n}\n\n.interactive-demo .demo-content {\n padding: 1.5rem;\n}\n\n.interactive-demo .demo-footer {\n background: #f8f9fa;\n padding: 1rem 1.5rem;\n border-top: 1px solid #e2e8f0;\n color: #6c757d;\n font-size: 0.9em;\n}\n\n/* Button styling for interactive elements */\n.btn-primary {\n background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\n border: none;\n color: white;\n padding: 0.75rem 1.5rem;\n border-radius: 6px;\n font-weight: 500;\n cursor: pointer;\n transition: transform 0.2s, box-shadow 0.2s;\n}\n\n.btn-primary:hover {\n transform: translateY(-1px);\n box-shadow: 0 4px 12px rgba(102, 126, 234, 0.3);\n}\n\n.btn-primary:disabled {\n opacity: 0.6;\n cursor: not-allowed;\n transform: none;\n box-shadow: none;\n}\n\n/* Terminal styling */\n.terminal-container {\n background: #1a202c;\n border-radius: 8px;\n padding: 1rem;\n color: #e2e8f0;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;\n font-size: 0.9em;\n}\n\n.terminal-input {\n background: #2d3748;\n border: 1px solid #4a5568;\n color: #e2e8f0;\n padding: 0.5rem;\n border-radius: 4px;\n width: 100%;\n font-family: inherit;\n}\n\n.terminal-output {\n background: #0a0e1a;\n padding: 1rem;\n border-radius: 4px;\n white-space: pre-wrap;\n word-break: break-all;\n min-height: 100px;\n max-height: 300px;\n overflow-y: auto;\n}\n\n/* Attention visualization styling */\n.attention-matrix {\n font-family: monospace;\n font-size: 0.8em;\n border-collapse: collapse;\n margin: 1rem 0;\n}\n\n.attention-matrix td {\n border: 1px solid #ddd;\n padding: 4px 8px;\n text-align: center;\n min-width: 50px;\n}\n\n/* Memory chart styling */\n.memory-chart-container {\n background: #f8f9fa;\n border: 2px solid #e9ecef;\n border-radius: 8px;\n padding: 1rem;\n margin: 1rem 0;\n}\n\n/* Image styling improvements */\nimg {\n max-width: 100%;\n height: auto;\n border-radius: 8px;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);\n margin: 1.5rem 0;\n}\n\n/* Table of contents styling - Fixed positioning like ultrascale */\n@media (min-width: 1200px) {\n d-article {\n overflow: visible !important;\n }\n \n d-contents {\n align-self: start !important;\n background: white !important;\n grid-column-start: 1 !important;\n grid-column-end: 4 !important;\n grid-row: auto / span 6 !important;\n justify-self: end !important;\n margin-top: 0em !important;\n padding-right: 3em !important;\n padding-left: 2em !important;\n position: -webkit-sticky !important; /* For Safari */\n position: sticky !important;\n top: 10px !important;\n overflow-y: auto !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n transition: max-height 0.3s ease-out !important;\n z-index: -100 !important;\n display: block !important;\n visibility: visible !important;\n }\n}\n\n@media (max-width: 1199px) {\n d-contents {\n display: none !important;\n background: white !important;\n justify-self: start !important;\n align-self: start !important;\n padding-bottom: 0.5em !important;\n margin-bottom: 1em !important;\n padding-left: 0.25em !important;\n border-bottom: 1px solid rgba(0, 0, 0, 0.1) !important;\n overflow-y: scroll !important;\n height: calc(100vh - 40px) !important;\n scrollbar-width: none !important;\n z-index: -100 !important;\n }\n}\n\n/* Force TOC to be visible and override distill defaults */\nd-contents {\n display: block !important;\n visibility: visible !important;\n opacity: 1 !important;\n}\n\n/* TOC Navigation styling */\nd-contents .toc-header {\n margin-bottom: 1.5rem;\n border-bottom: 2px solid #007bff;\n padding-bottom: 0.5rem;\n}\n\nd-contents .toc-title {\n font-weight: bold;\n font-size: 1.2em;\n color: #333;\n}\n\nd-contents nav a {\n color: rgba(0, 0, 0, 0.7);\n text-decoration: none;\n border-bottom: none;\n display: block;\n padding: 0.3rem 0;\n font-size: 0.9em;\n line-height: 1.4;\n transition: color 0.2s ease;\n}\n\nd-contents nav a:hover {\n color: #007bff;\n text-decoration: none;\n}\n\nd-contents nav a.active {\n color: #007bff;\n font-weight: 600;\n}\n\nd-contents nav div {\n margin-bottom: 0.2rem;\n}\n\n/* Smooth scrollbar */\nd-contents {\n scrollbar-width: thin;\n scrollbar-color: rgba(0, 123, 255, 0.3) transparent;\n}\n\nd-contents::-webkit-scrollbar {\n width: 6px;\n}\n\nd-contents::-webkit-scrollbar-track {\n background: transparent;\n}\n\nd-contents::-webkit-scrollbar-thumb {\n background: rgba(0, 123, 255, 0.3);\n border-radius: 3px;\n}\n\nd-contents::-webkit-scrollbar-thumb:hover {\n background: rgba(0, 123, 255, 0.5);\n}\n\n/* Custom tooltip styling for tenet links */\nd-contents nav a[title] {\n position: relative;\n cursor: help;\n}\n\nd-contents nav a[title]:hover {\n color: #667eea;\n}\n\n/* Enhanced tooltip using CSS (fallback for title attribute) */\nd-contents nav a[title]:after {\n content: attr(title);\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translateY(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n white-space: normal;\n width: 300px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n}\n\nd-contents nav a[title]:before {\n content: '';\n position: absolute;\n left: 100%;\n top: 50%;\n transform: translate(-8px, -50%);\n border: 8px solid transparent;\n border-right-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\nd-contents nav a[title]:hover:after,\nd-contents nav a[title]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Adjust for smaller screens */\n@media (max-width: 1400px) {\n d-contents nav a[title]:after {\n left: auto;\n right: 100%;\n margin-right: 1rem;\n width: 250px;\n }\n \n d-contents nav a[title]:before {\n left: auto;\n right: 100%;\n transform: translate(8px, -50%);\n border-right-color: transparent;\n border-left-color: #1a202c;\n }\n}\n\n/* Improve code syntax highlighting with Prism */\npre[class*=\"language-\"] {\n background: #f8f9fa !important;\n border: 1px solid #e9ecef !important;\n border-radius: 8px !important;\n padding: 1.5rem !important;\n margin: 1.5rem 0 !important;\n overflow-x: auto !important;\n font-size: 0.9em !important;\n line-height: 1.5 !important;\n}\n\ncode[class*=\"language-\"] {\n background: none !important;\n font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', 'Courier New', monospace !important;\n color: #383a42 !important;\n}\n\n/* Inline code */\np code, li code {\n background: #f1f3f4 !important;\n padding: 0.2em 0.4em !important;\n border-radius: 3px !important;\n font-size: 0.9em !important;\n color: #d73a49 !important;\n}\n\n/* Distill article improvements */\nd-article {\n max-width: none;\n font-size: 19px;\n line-height: 1.7 !important;\n color: #1a1a1a;\n padding-top: 1rem !important;\n grid-row-gap: 0 !important;\n}\n\nd-article > * {\n grid-column: middle !important;\n max-width: none;\n}\n\n/* Adjust for TOC on larger screens */\n@media (min-width: 1200px) {\n d-article > * {\n grid-column: text / page-end !important;\n max-width: none;\n }\n}\n\n/* Improve paragraph readability */\nd-article p {\n font-size: 19px;\n line-height: 1.5;\n margin-top: 0 !important;\n color: #1a1a1a;\n}\n\n/* Improve heading sizes */\nd-article h1 {\n font-size: 3rem;\n line-height: 1.2;\n margin: 3rem 0 2rem 0;\n color: #1a202c;\n font-weight: 700;\n}\n\nd-article h2 {\n font-size: 2.5rem;\n line-height: 1.3;\n margin: 1.5rem 0 0.75rem 0 !important;\n padding-bottom: 0.5rem !important;\n color: #1a202c;\n font-weight: 650;\n}\n\nd-article h3 {\n font-size: 2rem;\n line-height: 1.4;\n margin: 2rem 0 1rem 0;\n color: #1a202c;\n font-weight: 600;\n}\n\nd-article h4 {\n font-size: 1.5rem;\n line-height: 1.4;\n margin: 1.5rem 0 1rem 0;\n color: #2d3748;\n font-weight: 600;\n}\n\n/* Improve list readability */\nd-article ul li,\nd-article ol li {\n font-size: 18px;\n line-height: 1.7;\n margin-bottom: 0.5rem;\n}\n\n/* Enhanced tenet reference styling with custom tooltips */\na[href^=\"#source-of-truth\"],\na[href^=\"#one-model-one-file\"],\na[href^=\"#code-is-product\"],\na[href^=\"#standardize-dont-abstract\"],\na[href^=\"#do-repeat-yourself\"],\na[href^=\"#minimal-user-api\"],\na[href^=\"#backwards-compatibility\"],\na[href^=\"#consistent-public-surface\"],\na[href^=\"#modular-toolbox\"] {\n position: relative;\n color: #667eea;\n font-weight: 600;\n text-decoration: underline;\n text-decoration-color: rgba(102, 126, 234, 0.3);\n transition: all 0.3s ease;\n}\n\na[href^=\"#source-of-truth\"]:hover,\na[href^=\"#one-model-one-file\"]:hover,\na[href^=\"#code-is-product\"]:hover,\na[href^=\"#standardize-dont-abstract\"]:hover,\na[href^=\"#do-repeat-yourself\"]:hover,\na[href^=\"#minimal-user-api\"]:hover,\na[href^=\"#backwards-compatibility\"]:hover,\na[href^=\"#consistent-public-surface\"]:hover,\na[href^=\"#modular-toolbox\"]:hover {\n color: #4c51bf;\n text-decoration-color: #4c51bf;\n background: rgba(102, 126, 234, 0.1);\n padding: 2px 4px;\n border-radius: 4px;\n}\n\n/* Custom tooltip using data-tooltip attribute */\na[data-tooltip]:after {\n content: attr(data-tooltip);\n position: absolute;\n bottom: 100%;\n left: 50%;\n transform: translateX(-50%);\n background: #1a202c;\n color: white;\n padding: 0.75rem 1rem;\n border-radius: 8px;\n font-size: 0.85em;\n font-weight: 400;\n white-space: normal;\n width: 320px;\n line-height: 1.4;\n z-index: 1001;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n pointer-events: none;\n box-shadow: 0 4px 12px rgba(0, 0, 0, 0.2);\n margin-bottom: 8px;\n}\n\na[data-tooltip]:before {\n content: '';\n position: absolute;\n bottom: 100%;\n left: 50%;\n transform: translateX(-50%);\n border: 8px solid transparent;\n border-top-color: #1a202c;\n z-index: 1002;\n opacity: 0;\n visibility: hidden;\n transition: opacity 0.3s ease, visibility 0.3s ease;\n}\n\na[data-tooltip]:hover:after,\na[data-tooltip]:hover:before {\n opacity: 1;\n visibility: visible;\n}\n\n/* Breadcrumb navigation styling */\n.crumbs {\n background: linear-gradient(135deg, #f0f4ff 0%, #e6eeff 100%);\n border-left: 5px solid #667eea;\n padding: 1.25rem 1.75rem;\n margin: 2.5rem 0;\n border-radius: 0 8px 8px 0;\n box-shadow: 0 2px 8px rgba(102, 126, 234, 0.12);\n font-size: 0.95em;\n line-height: 1.6;\n color: #4a5568;\n}\n\n.crumbs strong {\n color: #667eea;\n font-weight: 700;\n}\n\n.crumbs code {\n background: rgba(102, 126, 234, 0.1);\n padding: 0.15em 0.4em;\n border-radius: 3px;\n font-size: 0.9em;\n color: #4c51bf;\n}\n\n.crumbs a {\n color: #667eea;\n font-weight: 500;\n}\n\n/* Improve blockquote styling */\nd-article blockquote {\n font-size: 19px;\n line-height: 1.8;\n padding: 1.5rem 2rem;\n margin: 2rem 0;\n border-left: 4px solid #667eea;\n background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 50%);\n border-radius: 0 8px 8px 0;\n font-style: italic;\n color: #4a5568;\n}\n\n/* Link capsule styling - only for external HTTP(S) links */\nd-article a[href^=\"http://\"],\nd-article a[href^=\"https://\"] {\n background: linear-gradient(135deg, #e3f2fd 0%, #bbdefb 100%);\n color: #1565c0;\n text-decoration: none;\n padding: 0.15em 0.5em;\n border-radius: 12px;\n border: 1px solid #90caf9;\n display: inline-block;\n transition: all 0.3s ease;\n font-weight: 500;\n box-shadow: 0 1px 3px rgba(21, 101, 192, 0.15);\n}\n\nd-article a[href^=\"http://\"]:hover,\nd-article a[href^=\"https://\"]:hover {\n background: linear-gradient(135deg, #2196f3 0%, #1976d2 100%);\n color: white;\n border-color: #1565c0;\n transform: translateY(-1px);\n box-shadow: 0 4px 12px rgba(21, 101, 192, 0.3);\n}\n\nd-article a[href^=\"http://\"]:active,\nd-article a[href^=\"https://\"]:active {\n transform: translateY(0);\n box-shadow: 0 1px 3px rgba(21, 101, 192, 0.2);\n}\n\n/* Full width elements */\nd-article .code-compare,\nd-article .interactive-demo,\nd-article .memory-chart-container {\n max-width: none;\n width: 100%;\n margin-left: 0;\n margin-right: 0;\n}\n\n/* Responsive design improvements */\n@media (max-width: 1200px) {\n d-article .code-compare,\n d-article .interactive-demo {\n max-width: 95%;\n margin-left: auto;\n margin-right: auto;\n }\n}\n\n@media (max-width: 768px) {\n .tenet-list li.tenet {\n padding: 1rem;\n }\n\n .interactive-demo .demo-content {\n padding: 1rem;\n }\n}\n\n"],"sourceRoot":""}]);
1888
  // Exports
1889
  /* harmony default export */ const __WEBPACK_DEFAULT_EXPORT__ = (___CSS_LOADER_EXPORT___);
1890
 
dist/main.bundle.js.map CHANGED
The diff for this file is too large to render. See raw diff
 
src/transformers-custom.css CHANGED
@@ -635,6 +635,37 @@ a[data-tooltip]:hover:before {
635
  visibility: visible;
636
  }
637
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
638
  /* Improve blockquote styling */
639
  d-article blockquote {
640
  font-size: 19px;
 
635
  visibility: visible;
636
  }
637
 
638
+ /* Breadcrumb navigation styling */
639
+ .crumbs {
640
+ background: linear-gradient(135deg, #f0f4ff 0%, #e6eeff 100%);
641
+ border-left: 5px solid #667eea;
642
+ padding: 1.25rem 1.75rem;
643
+ margin: 2.5rem 0;
644
+ border-radius: 0 8px 8px 0;
645
+ box-shadow: 0 2px 8px rgba(102, 126, 234, 0.12);
646
+ font-size: 0.95em;
647
+ line-height: 1.6;
648
+ color: #4a5568;
649
+ }
650
+
651
+ .crumbs strong {
652
+ color: #667eea;
653
+ font-weight: 700;
654
+ }
655
+
656
+ .crumbs code {
657
+ background: rgba(102, 126, 234, 0.1);
658
+ padding: 0.15em 0.4em;
659
+ border-radius: 3px;
660
+ font-size: 0.9em;
661
+ color: #4c51bf;
662
+ }
663
+
664
+ .crumbs a {
665
+ color: #667eea;
666
+ font-weight: 500;
667
+ }
668
+
669
  /* Improve blockquote styling */
670
  d-article blockquote {
671
  font-size: 19px;