ATTN MASK — GPT-2 (causal)
Tokens: [The, cat, sat, on, the, mat]
Legend: x = can attend, . = masked (future)
The cat sat on the mat
The x
cat x x
sat x x x
on x x x x
the x x x x x
mat x x x x x x
ATTN MASK — PaliGemma-style (bidirectional prefix + causal suffix)
Prefix: [<i0> <i1> <i2> <i3> <i4> What is this]
Suffix: [A great duck]
Legend: ✓ = can attend, ✗ = cannot
<i0><i1><i2><i3><i4> What is this | A great duck
<i0> ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
<i1> ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
<i2> ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
<i3> ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
<i4> ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
What ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
is ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
this ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗ ✗
--------------------------------------------------------------------
A ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✗
great ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗
duck ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓