attention-mask-visualizer
  ATTN MASK — GPT-2 (causal)
  Tokens: [The, cat, sat, on, the, mat]
  Legend: x = can attend, . = masked (future)
  
           The cat sat on  the mat
  The       x
  cat       x   x
  sat       x   x   x
  on        x   x   x   x
  the       x   x   x   x   x
  mat       x   x   x   x   x   x
  
  
  ATTN MASK — PaliGemma-style (bidirectional prefix + causal suffix)
  Prefix:  [<i0> <i1> <i2> <i3> <i4> What is this]
  Suffix:  [A great duck]
  Legend: ✓ = can attend, ✗ = cannot

             <i0><i1><i2><i3><i4> What  is  this  |   A   great  duck
  <i0>        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  <i1>        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  <i2>        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  <i3>        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  <i4>        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  What        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  is          ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  this        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✗     ✗      ✗
  --------------------------------------------------------------------
  A           ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✓     ✗      ✗
  great       ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✓     ✓      ✗
  duck        ✓   ✓   ✓   ✓   ✓    ✓     ✓    ✓        ✓     ✓      ✓