for each block b:
s += qt · KbT
;
o += softmaxonline(s) · Vb
— K and V from the same physical block, one at a time
Block Table:
LB 0 → Phys P3 · T1–T4
LB 1 → Phys P7 · T5–T8
×
Kᵀ
iteration b=0
iteration b=1
(dₖ × 8)
=
×
=
✓ K and V from the same physical block are loaded and processed together — one block at a time.
No need to load all of K before touching V. Online softmax tracks running statistics across blocks.