SLIDE 20 Tor M. Aamodt
Hardware Details Hardware Details
230 230-
- cycles, TLB miss = 30 cycles
cycles, TLB miss = 30 cycles Memory Memory L1 (separate I&D): 16KB 4 L1 (separate I&D): 16KB 4-
way 1-
cycle L2 (shared) : 256 KB 4 L2 (shared) : 256 KB 4-
way 14-
cycles L3 (shared) : 3072 KB 12 L3 (shared) : 3072 KB 12-
way 30-
cycles Caches Caches 2 bundles from 1, or 1 bundle from 2 threads 2 bundles from 1, or 1 bundle from 2 threads Prioritize main thread, helpers: round Prioritize main thread, helpers: round-
robin Issue Issue 2k 2k-
entry gshare
. 256-
entry 4-
way assoc. BTB. Helper threads oracle branch prediction (always Helper threads oracle branch prediction (always follow correct path) follow correct path) Branch Branch Pred Pred. . next line prefetch (triggered on miss) next line prefetch (triggered on miss) Stream prefetch triggered by compiler hints Stream prefetch triggered by compiler hints (max. 4 outstanding prefetches per context) (max. 4 outstanding prefetches per context) I I-
prefetch 2 bundles from 1, or 1 bundle from 2 threads, 2 bundles from 1, or 1 bundle from 2 threads, prioritizing main thread, helpers ICOUNT prioritizing main thread, helpers ICOUNT Fetch Fetch In In-
- order 12
- rder 12-
- stage pipeline
stage pipeline Pipelining Pipelining SMT processor with 2, 4, or 8 hardware contexts SMT processor with 2, 4, or 8 hardware contexts Threading Threading