MASK: Redesigning the GPU Memory Hierarchy to Support - - PowerPoint PPT Presentation

mask redesigning the gpu memory hierarchy to support
SMART_READER_LITE
LIVE PREVIEW

MASK: Redesigning the GPU Memory Hierarchy to Support - - PowerPoint PPT Presentation

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2


slide-1
SLIDE 1

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency

Rachata Ausavarungnirun

Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu

GPU 2 (Virginia EF) Tuesday 2PM-3PM

slide-2
SLIDE 2

GPU Core

En Enabling GPU PU Sharing with Address Translation

2

GPU Core GPU Core GPU Core Page Table Walkers Page Table (in main memory)

App 1 App 2 Virtual Address

slide-3
SLIDE 3

GPU Core

En Enabling GPU PU Sharing with Address Translation

3

GPU Core GPU Core GPU Core Page Table Walkers Page Table (in main memory)

High latency page walks

App 1 App 2 Virtual Address

slide-4
SLIDE 4

GPU Core

Private TLB

St State-of

  • f-the

the-Ar Art T Translation

  • n S

Suppor

  • rt i

in G GPUs Us

4

GPU Core GPU Core GPU Core Shared TLB

Private TLB

Page Table Walkers Page Table (in main memory)

Private TLB Private TLB

High latency page walks

Private Shared App 1 App 2 Virtual Address

slide-5
SLIDE 5

Three Source ces of Ineffici ciency cy in Translation

5

High TLB contention n Inefficient caching Bypass Address-translation is latency sensitive

MASK: A Translation-aware Memory Hierarchy

slide-6
SLIDE 6

Three Source ces of Ineffici ciency cy in Translation

6

High TLB contention n

slide-7
SLIDE 7

Three Source ces of Ineffici ciency cy in Translation

7

High TLB contention n Inefficient caching Bypass

slide-8
SLIDE 8

Three Source ces of Ineffici ciency cy in Translation

8

High TLB contention n Inefficient caching Bypass Address translation is latency-sensitive

slide-9
SLIDE 9

Ou Our S Solution

9

MASK: A Translation-aware Memory Hierarchy

slide-10
SLIDE 10

Th Three Components of MASK

10

slide-11
SLIDE 11

Th Three Components of MASK

11

TLB-fill Tokens Reduces TLB contention

Shared TLB

slide-12
SLIDE 12

Th Three Components of MASK

12

TLB-fill Tokens Reduces TLB contention Translation-aware L2 Bypass Improves L2 cache utilization

Shared TLB L2 Data Cache

Translation Data

slide-13
SLIDE 13

Th Three Components of MASK

13

TLB-fill Tokens Reduces TLB contention Translation-aware L2 Bypass Improves L2 cache utilization Address-space-aware Memory Scheduler Lowers address translation latency

Shared TLB L2 Data Cache Main Memory

Translation Data Translation Data

slide-14
SLIDE 14

Th Three Components of MASK

14

TLB-fill Tokens Reduces TLB contention Translation-aware L2 Bypass Improves L2 cache utilization Address-space-aware Memory Scheduler Lowers address translation latency

MASK improves performance by 57.8%

Shared TLB L2 Data Cache Main Memory

Translation Data Translation Data

slide-15
SLIDE 15

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency

Rachata Ausavarungnirun

Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu

GPU 2 (Virginia EF) Tuesday 2PM-3PM