mask redesigning the gpu memory hierarchy to support

MASK: Redesigning the GPU Memory Hierarchy to Support - PowerPoint PPT Presentation

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2


  1. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

  2. En Enabling GPU PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 Page Table App 2 (in main memory) 2

  3. Enabling GPU En PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 3

  4. St State-of of-the the-Ar Art T Translation on S Suppor ort i in G GPUs Us Virtual Address GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Shared TLB Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 4

  5. Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address-translation is latency sensitive MASK: A Translation-aware Memory Hierarchy 5

  6. Three Source ces of Ineffici ciency cy in Translation High TLB contention n 6

  7. Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass 7

  8. Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address translation is latency-sensitive 8

  9. Ou Our S Solution MASK: A Translation-aware Memory Hierarchy 9

  10. Th Three Components of MASK 10

  11. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention 11

  12. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization 12

  13. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency 13

  14. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency MASK improves performance by 57.8% 14

  15. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

Recommend


More recommend