MASK: Redesigning the GPU Memory Hierarchy to Support - PowerPoint PPT Presentation

Nov 29, 2022 •314 likes •483 views

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM
En Enabling GPU PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 Page Table App 2 (in main memory) 2
Enabling GPU En PU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 3
St State-of of-the the-Ar Art T Translation on S Suppor ort i in G GPUs Us Virtual Address GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Shared TLB Page Table Walkers App 1 High latency page walks Page Table App 2 (in main memory) 4
Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address-translation is latency sensitive MASK: A Translation-aware Memory Hierarchy 5
Three Source ces of Ineffici ciency cy in Translation High TLB contention n 6
Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass 7
Three Source ces of Ineffici ciency cy in Translation High TLB contention n Inefficient caching Bypass Address translation is latency-sensitive 8
Ou Our S Solution MASK: A Translation-aware Memory Hierarchy 9
Th Three Components of MASK 10
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention 11
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization 12
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency 13
Th Three Components of MASK TLB-fill Tokens Shared TLB Reduces TLB contention Translation Data Translation-aware L2 Bypass L2 Data Cache Improves L2 cache utilization Address-space-aware Translation Data Memory Scheduler Main Memory Lowers address translation latency MASK improves performance by 57.8% 14
MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

Recommend

1. procedure ONE TO ALL BC( d , my id , X ) 2. begin mask := 2 d 1; 3. /* Set all d bits of

1. procedure ONE TO ALL BC( d , my id , X ) 2. begin mask := 2 d 1; 3. /* Set all d bits of mask to 1 */ 4. for i := d 1 downto 0 do /* Outer loop */ mask := mask XOR 2 i ; 5. /* Set bit i of mask to 0 */ 6. if ( my id AND mask ) = 0

501 views • 10 slides

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if we want to run a process that requires 10GB memory? 2 Memory Hierarchy Virtual Memory Memory Cache Registers Answer: Pretend we had something

739 views • 45 slides

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview Problem CPU vs Memory performance imbalance Solution Driven by temporal and spatial locality Memory hierarchies Memory hierarchies

737 views • 55 slides

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder Michigan Technological University Randy Katz & David A. Patterson University of California, Berkeley Levels in a memory hierarchy 2 Basic

527 views • 21 slides

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache Performance Proc/Regs L1-Cache Bigger Faster L2-Cache Memory hierarchy concept, cache L3-Cache (optional) design fundamentals, set-associative

308 views • 4 slides

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS ARCHITECTURES GPU 0 GPU 1 GPU 2 CPU GPU 0 GPU 1 GPU 2 MEM MEM MEM SYS MEM 2 UNIFIED MEMORY FUNDAMENTALS Single Pointer CPU code GPU code void

874 views • 70 slides

WHOLEHEARTED Digging Deeper to Broaden Our Reach WE WEAR THE MASK We Wear the Mask BY PAUL

WHOLEHEARTED Digging Deeper to Broaden Our Reach WE WEAR THE MASK We Wear the Mask BY PAUL LAURENCE DUNBAR We wear the mask that grins and lies, It hides our cheeks and shades our eyes, This debt we pay to human guile; With torn and

423 views • 5 slides

Single mask technology implementation Piotr Bielwka 10 th RD51 Stony Brook Single mask

Single mask technology implementation Piotr Bielwka 10 th RD51 Stony Brook Single mask technology implementation a) New machinery: Old Kapton etching machine New Kapton etching machine 10 th RD51 Stony Brook Single mask technology

240 views • 6 slides

MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA

April 4-7, 2016 | Silicon Valley MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA System System System GDDR5 Memory GDDR5 Memory GDDR5 Memory Memory Memory Memory GPU GPU GPU CPU CPU

1.1k views • 77 slides

Redesigning Downtown Transit & Redesigning Downtown Transit & the Providence Station

Redesigning Downtown Transit & Redesigning Downtown Transit & the Providence Station Transit Center: Improved Mobility, Improved Economy Background & Need Why is Project Needed? Dow ntow n Providence is expanding. Key activity

370 views • 16 slides

xBook: Redesigning Privacy Control in Social xBook: Redesigning Privacy Control in Social

xBook: Redesigning Privacy Control in Social xBook: Redesigning Privacy Control in Social Networking Platforms Networking Platforms Kapil Singh, Sumeer Bhola and Wenke Lee Social networking is growing 2 Privacy concerns are growing

780 views • 54 slides

Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the

CS 240 Stage 3 Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the process model Virtual memory Dynamic memory allocation Victory lap Memory Hierarchy: Cache Memory hierarchy Cache basics Locality

1.02k views • 49 slides

1 5.1 Introduction A Typical Memory Hierarchy A Typical Memory Hierarchy Memory Technology

Review: Major Components of a Computer Review: Major Components of a Computer Processor Devices Memory Memory Control Hierarchy Hierarchy Input Memory Datapath Output Original slides from: Computer Architecture A Quantitative

450 views • 10 slides

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

MULTI GPU PROGRAMMING WITH MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5 Memory GDDR5 Memory Memory Memory Memory GPU GPU GPU CPU CPU CPU PCI-e PCI-e PCI-e Network Network

494 views • 34 slides

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Status of GPU offloading on Wayland Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland How to do GPU offloading 1 GPU offloading with X DRI2 2 GPU offloading with Wayland 3 and XWayland? 4

432 views • 29 slides

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs. CPU Why to Learn About GPU? NVIDIA GPU relative performances Why to Learn About GPU? Hardware Why to Learn About GPU? Interactive rendering

854 views • 46 slides

ROADWAY ACTION PLAN WELCOME + INTRODUCTIONS NCTCOG STAFF Michael Morris, P.E. - Director of

NORTH/SOUTH ROADWAY NEEDS AND OPPORTUNITIES February 16, 2017 1:00pm McKinney City Hall McKinney, Texas North Central Texas Council of Governments COLLIN COUNTY ROADWAY ACTION PLAN WELCOME + INTRODUCTIONS NCTCOG STAFF Michael Morris,

761 views • 34 slides

Westerham Bypass Presentation for town meeting on 25 th February Context to Bypass discussions

Westerham Bypass Presentation for town meeting on 25 th February Context to Bypass discussions Squerryes has been asked by WTC to review feasibility of bypass. It is at a very early stage. Bypass has been on the agenda for circa 40 plus

97 views • 8 slides

Variable Frequency Drives (VFDs): Senior Technical Advisor SAIC, an Ameren Illinois Energy

David Gibson Variable Frequency Drives (VFDs): Senior Technical Advisor SAIC, an Ameren Illinois Energy Efficiency Opportunities ActOnEnergy Partner Quiz Questions: What do you already know about energy efficiency options for Variable

342 views • 23 slides

Medical Progress and New Genetics Facing Another Cost Explosion? - A Plea for an Impartial

CENTRE OF COMPETENCE BIOSCIENCES Medical Progress and New Genetics Facing Another Cost Explosion? - A Plea for an Impartial Approach to Genetic Testing Dr. Achim Regenauer IAAHS Colloquium 2004 April 28 29, 2004, Dresden Mnchener

697 views • 32 slides

Pushkar Bypass Ajmer Overview There's more to Rajasthan than what meets the eye. Surrounded by

The Gateway Resort Pushkar Bypass Ajmer Overview There's more to Rajasthan than what meets the eye. Surrounded by the Aravalli mountains, north of Ajmer city, Gateway Ajmer finds an adequate blend between tranquility and bustle. 81 grand rooms

576 views • 18 slides

Bypassing 802.1X In an IPv6 environment Introduction and motivation What is 802.1X? IEEE

Robert Diepeveen & Ruben de Vries Bypassing 802.1X In an IPv6 environment Introduction and motivation What is 802.1X? IEEE standard Port-based network access protocol Authentication mechanism for devices wishing to attach to

360 views • 17 slides

RET WORKSHOP Trainer: Vuth Ith June - 2019 connecting the mobile world Company Confidential 1

RET WORKSHOP Trainer: Vuth Ith June - 2019 connecting the mobile world Company Confidential 1 TR TRAINI AINING NG OBJE JECTIVES Gain knowledge on basic RET Systems Become familiar with Legacy & NWAV RET antennas Understand

471 views • 31 slides

T-Fuzz: Fuzzing by Program Transformation Hui Peng 1 , Yan Shoshitaishvili 2 , Mathias Payer 1 1

T-Fuzz: Fuzzing by Program Transformation Hui Peng 1 , Yan Shoshitaishvili 2 , Mathias Payer 1 1 2 Fuzzing as a bug finding approach Fuzzing is highly effective in finding bugs (CVEs) Developers use it as proactive defense measure:

844 views • 32 slides

MASK: Redesigning the GPU Memory Hierarchy to Support - PowerPoint PPT Presentation

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2

1. procedure ONE TO ALL BC( d , my id , X ) 2. begin mask := 2 d 1; 3. /* Set all d bits of

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

WHOLEHEARTED Digging Deeper to Broaden Our Reach WE WEAR THE MASK We Wear the Mask BY PAUL

Single mask technology implementation Piotr Bielwka 10 th RD51 Stony Brook Single mask

MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA

Redesigning Downtown Transit & Redesigning Downtown Transit & the Providence Station

xBook: Redesigning Privacy Control in Social xBook: Redesigning Privacy Control in Social

Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the

1 5.1 Introduction A Typical Memory Hierarchy A Typical Memory Hierarchy Memory Technology

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

ROADWAY ACTION PLAN WELCOME + INTRODUCTIONS NCTCOG STAFF Michael Morris, P.E. - Director of

Westerham Bypass Presentation for town meeting on 25 th February Context to Bypass discussions

Variable Frequency Drives (VFDs): Senior Technical Advisor SAIC, an Ameren Illinois Energy

Medical Progress and New Genetics Facing Another Cost Explosion? - A Plea for an Impartial

Pushkar Bypass Ajmer Overview There's more to Rajasthan than what meets the eye. Surrounded by

Bypassing 802.1X In an IPv6 environment Introduction and motivation What is 802.1X? IEEE

RET WORKSHOP Trainer: Vuth Ith June - 2019 connecting the mobile world Company Confidential 1

T-Fuzz: Fuzzing by Program Transformation Hui Peng 1 , Yan Shoshitaishvili 2 , Mathias Payer 1 1

Sambuz

Useful Links

Newsletter

Mail Us

MASK: Redesigning the GPU Memory Hierarchy to Support - PowerPoint PPT Presentation

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2

1. procedure ONE TO ALL BC( d , my id , X ) 2. begin mask := 2 d 1; 3. /* Set all d bits of

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

WHOLEHEARTED Digging Deeper to Broaden Our Reach WE WEAR THE MASK We Wear the Mask BY PAUL

Single mask technology implementation Piotr Bielwka 10 th RD51 Stony Brook Single mask

MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA

Redesigning Downtown Transit &amp; Redesigning Downtown Transit &amp; the Providence Station

xBook: Redesigning Privacy Control in Social xBook: Redesigning Privacy Control in Social

Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the

1 5.1 Introduction A Typical Memory Hierarchy A Typical Memory Hierarchy Memory Technology

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

ROADWAY ACTION PLAN WELCOME + INTRODUCTIONS NCTCOG STAFF Michael Morris, P.E. - Director of

Westerham Bypass Presentation for town meeting on 25 th February Context to Bypass discussions

Variable Frequency Drives (VFDs): Senior Technical Advisor SAIC, an Ameren Illinois Energy

Medical Progress and New Genetics Facing Another Cost Explosion? - A Plea for an Impartial

Pushkar Bypass Ajmer Overview There's more to Rajasthan than what meets the eye. Surrounded by

Bypassing 802.1X In an IPv6 environment Introduction and motivation What is 802.1X? IEEE

RET WORKSHOP Trainer: Vuth Ith June - 2019 connecting the mobile world Company Confidential 1

T-Fuzz: Fuzzing by Program Transformation Hui Peng 1 , Yan Shoshitaishvili 2 , Mathias Payer 1 1

Sambuz

Useful Links

Newsletter

Mail Us

Redesigning Downtown Transit & Redesigning Downtown Transit & the Providence Station