MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM
En Enabling GPU PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 Page Table App 2 (in main memory) 2
Enabling GPU En PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 3
St State-of of-the the-Ar Art T Translation on S Suppor ort i in G GPUs Us Virtual Address GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Shared TLB Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 4
Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address-translation is latency sensitive MASK: A Translation-aware Memory Hierarchy 5
Three Source ces of Ineffici ciency cy in Translation High TLB contention n 6
Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass 7
Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address translation is latency-sensitive 8
Ou Our S Solution MASK: A Translation-aware Memory Hierarchy 9
Th Three Components of MASK 10
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention 11
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization 12
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency 13
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency MASK improves performance by 57.8% 14
MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM
Recommend
More recommend