Pablo Montalvo‑Leroux · ML Engineer @ Hugging Face
Static graphs and transformers existence are inversely correlated
torch.compile ≈ sweet spot: author in dynamic, ship in static.
Research cadence ≈ hours; any friction kills momentum.
      class BertEmbeddings(nn.Module):
          …
      class BertModel(BertPreTrainedModel):
          …
              Atomic PRs → faster reviews → community velocity.
Compose new blocks via subclass & override.
class LlamaRotaryLoRA(LlamaAttention):
    def __init__(…):
        super().__init__(…)
        self.q_proj = LoRA(self.q_proj)
        self.apply_rotary()
        tp_plan keeps module code intact0‑copy weight partitioning · 15 % RAM cut on A100
            processor = AutoProcessor.from_pretrained("Qwen/Qwen3-8B")
            model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen3-8B")
        Same API across text · vision · audio.
Mitigations: Triton, compiled custom ops, compile‑time fallback, and callable kernels!
New initiative
https://huggingface.co/kernels-community
We want to facilitate adoption. How does a radio work? Would you know how to tune it?
How does a computer work? Should you know how it does to be able to navigate the web?