← Home

Paged Attention — Block-Wise Computation

Step 1 / 10
for each block b:    s += qt · KbT  ;  o += softmaxonline(s) · Vb — K and V from the same physical block, one at a time
Block Table:
LB 0 → Phys P3  ·  T1–T4
LB 1 → Phys P7  ·  T5–T8
qt
(1 × dₖ)
×
Kᵀ
iteration b=0
P3 · LB0
P7 · LB1
iteration b=1
(dₖ × 8)
=
s
(1 × 8)
softmax / √dₖ
α
(1 × 8)
×
V
P3 · LB0
P7 · LB1
(8 × dₖ)
=
ot
(1 × dₖ)
✓ K and V from the same physical block are loaded and processed together — one block at a time.
No need to load all of K before touching V. Online softmax tracks running statistics across blocks.