SLIDE 9 9
Instruction concurrency Warp concurrency time time Load latency Execution
Impractically large number of warps required to completely hide latency
Higher load latency due to congestion LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
LOAD
Independent Independent Independent Independent DEPENDENCY
(Intra-warp concurrency) (Inter-warp concurrency)
Fewer independent operations
The Case of Limited Parallelism
Load latency Execution
The Case of Limited Parallelism GPU Architecture