← Home
Distributed Execution — Tensor Parallelism
Step 1 / 11
Coordinator
◆ Master Worker
Schedules requests and manages the KV cache block table. Broadcasts block mappings to every GPU worker before each forward pass.
Active
R1 · 4 tokens (T1–T4)
Block Table — logical → physical per GPU
Req
LB
GPU 0
GPU 1
GPU 2
GPU 0
Heads 0–1
IDLE
Free Pool
KV Cache (heads 0–1 only)
GPU 1
Heads 2–3
IDLE
Free Pool
KV Cache (heads 2–3 only)
GPU 2
Heads 4–5
IDLE
Free Pool
KV Cache (heads 4–5 only)
All-Reduce
← Prev
Next →
▶ Play All
↺ Reset