mask redesigning the gpu memory hierarchy to support
play

MASK: Redesigning the GPU Memory Hierarchy to Support - PowerPoint PPT Presentation

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2


  1. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

  2. En Enabling GPU PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 Page Table App 2 (in main memory) 2

  3. Enabling GPU En PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 3

  4. St State-of of-the the-Ar Art T Translation on S Suppor ort i in G GPUs Us Virtual Address GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Shared TLB Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 4

  5. Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address-translation is latency sensitive MASK: A Translation-aware Memory Hierarchy 5

  6. Three Source ces of Ineffici ciency cy in Translation High TLB contention n 6

  7. Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass 7

  8. Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address translation is latency-sensitive 8

  9. Ou Our S Solution MASK: A Translation-aware Memory Hierarchy 9

  10. Th Three Components of MASK 10

  11. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention 11

  12. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization 12

  13. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency 13

  14. Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency MASK improves performance by 57.8% 14

  15. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend