Module 6.1 Memory Access Performance DRAM Bandwidth Objective To - PowerPoint PPT Presentation

Mar 06, 2023 •334 likes •464 views

GPU Teaching Kit Accelerated Computing Module 6.1 Memory Access Performance DRAM Bandwidth Objective To learn that memory bandwidth is a first-order performance factor in a massively parallel processor DRAM bursts, banks, and

GPU Teaching Kit Accelerated Computing Module 6.1 – Memory Access Performance DRAM Bandwidth
Objective – To learn that memory bandwidth is a first-order performance factor in a massively parallel processor – DRAM bursts, banks, and channels – All concepts are also applicable to modern multicore processors 2
Global Memory (DRAM) Bandwidth – Ideal – Reality 3
DRAM Core Array Organization – Each DRAM core array has about 16M bits – Each bit is stored in a tiny capacitor made of one transistor Row Row Memory Cell Addr Decoder Core Array Sense Amps Column Latches Wide Column Mux Addr Narrow Pin Interface Off-chip Data 4
A very small (8x2-bit) DRAM Core Array 0 1 1 decode S ense amps Mux 5
DRAM Core Arrays are Slow – Reading from a cell in the core array is a very slow process – DDR: Core speed = ½ interface speed – DDR2/GDDR3: Core speed = ¼ interface speed DDR3/GDDR4: Core speed = ⅛ interface speed – – … likely to be worse in the future About 1000 cells connected to each vertical line decode A very small capacitance that stores a data bit To sense amps 6
DRAM Bursting – For DDR{2,3} SDRAM cores clocked at 1/N speed of the interface: – Load (N × interface width) of DRAM bits from the same row at once to an internal buffer, then transfer in N steps at interface speed – DDR3/GDDR4: buffer width = 8X interface width 7
DRAM Bursting Timing Example Address bits to decoder bits on interface Core Array access delay time Non-burst timing Burst timing Modern DRAM systems are designed to always be accessed in burst mode. Burst bytes are transferred to the processor but discarded when accesses are not to sequential locations. 8
Multiple DRAM Banks decode decode S ense amps S ense amps Mux Mux Bank 0 Bank 1 9
DRAM Bursting with Banking S ingle-Bank burst timing, dead time on interface Multi-Bank burst timing, reduced dead time 10
GPU off-chip memory subsystem – NVIDIA GTX280 GPU: – Peak global memory bandwidth = 141.7GB/s – Global memory (GDDR3) interface @ 1.1GHz – (Core speed @ 276Mhz) – For a typical 64-bit interface, we can sustain only about 17.6 GB/s (Recall DDR - 2 transfers per clock) – We need a lot more bandwidth (141.7 GB/s) – thus 8 memory channels 11
GPU Teaching Kit The GPU Teaching Kit is licensed by NVIDIA and the University of Illinois under the Creative Commons Attribution-NonCommercial 4.0 International License.

Recommend

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Main Focus I. Memory as a process Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory the process by which I. Sensory Memory information is - acquired, II. Short -Term Memory - stored,

169 views • 5 slides

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the program Introduction Module 1a Module 1b Module 2a Module 2b Module 3a Module 3b Module 3c Module 3d Module 3e Module 4 INTRODUCTION: WHY ARE

853 views • 52 slides

Memory Management Memory Manager Requirements Minimize primary memory access time

Memory Management Memory Manager Requirements Minimize primary memory access time Maximize primary memory size Primary memory must be cost-effective Todays memory manager: Allocates primary memory to processes Maps

637 views • 27 slides

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3 Status Boards Position Log Request for Assistance Mission/Task Significant Events Module 4 Forms Module 5 Links Module 6

847 views • 72 slides

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module E Math 237 Module E Section E.0 Section E.1 Section E.2 Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section E.1 Section E.2 How can we solve systems of linear equations? Module E Math

578 views • 47 slides

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Module V Math 237 Module V Section V.0 Section V.1 Section V.2 Section V.3 Section V.4 Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2 Section V.3 Section V.4 What is a vector space? Module V

870 views • 67 slides

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC Computing Computing + Fabric SoC Memory HYPERCONVERGED Exascale EDGE DEVICE SYSTEM Eliminate data movement via shared

401 views • 11 slides

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory Device Device Memory Computer-Computer Comm CPU CPU CPU CPU Comm Comm Comm Comm Memory Memory Memory Memory Device Device Device Device

629 views • 36 slides

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if we want to run a process that requires 10GB memory? 2 Memory Hierarchy Virtual Memory Memory Cache Registers Answer: Pretend we had something

737 views • 45 slides

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a bucket of bytes . Computer Memory Organization Memory is a bucket of bytes. Each byte is 8 bits wide. Computer Memory Organization Memory

994 views • 42 slides

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate (Working memory) Retain (Long term memory) Memory Retrieve (Long term memory) processing A difficulty with any one or more of these skills

361 views • 6 slides

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 - Identifying the Building Blocks Module 4 - Reviewing a portfolio This Module Active vs Passive investing Off the shelf solutions Trackers and ETFs and

648 views • 32 slides

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE 2 MODULE 3 LAWS MODULE 4 UTILIZING MODULE 5 BLUE MODULE 6 FUNDING EMERGENCY COURTHOUSE AND AUTHORITIES THE UNIFIED

654 views • 40 slides

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

MODULE SPECIFICATION Project Management and Credit Module Title: Level : 5 20 Presentation Techniques Value : Is this a Code of module Module code : ENG543 new NO being replaced : module? Cost Centre : GAME JACS3 code : F311

261 views • 4 slides

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio informatics .ca Module 7 Data Visualization Anamaria Crisan Learning Objectives of Module Understand the process of encoding and decoding

1.31k views • 79 slides

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

Module A Math 237 Module A Section A.1 Section A.2 Section A.3 Section A.4 Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section A.2 Section A.3 Section A.4 How can we understand linear maps

858 views • 75 slides

Burst Photography ! EE367/CS448I: Computational Imaging and Display ! stanford.edu/class/ee367 !

Burst Photography ! EE367/CS448I: Computational Imaging and Display ! stanford.edu/class/ee367 ! Lecture 7 ! Gordon Wetzstein ! Stanford University ! Motivation ! wikipedia ! exposure sequence ! -4 stops ! Motivation ! wikipedia ! exposure sequence

1.41k views • 69 slides

Todd Mytkowicz Kathryn S. McKinley Wikipedia Sensors Big data Sampson et al. Bishop hidden

Uncertain< T > A First-Order Type for Uncertain Data James Bornholt Supervisor: Steve Blackburn Todd Mytkowicz Kathryn S. McKinley Wikipedia Sensors Big data Sampson et al. Bishop hidden units z M w (1) w (2) MD KM x D y K inputs

1.14k views • 84 slides

CS 188: Artificial Intelligence Bayes Nets Instructors: Sergey Levine --- University of

CS 188: Artificial Intelligence Bayes Nets Instructors: Sergey Levine --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at

314 views • 28 slides

Zipper Alarm Blue A Sketch Model Review PROBLEM Frequent travelers do not have a convenient and

Zipper Alarm Blue A Sketch Model Review PROBLEM Frequent travelers do not have a convenient and universal method of protecting their valuables from theft . Pickpocketing in Urban Areas Despite high recorded pickpocketing incidents, most

473 views • 8 slides

BurScale: Using Burstable Instances for Cost-Effective Autoscaling in the Public Cloud Ata Fatahi

BurScale: Using Burstable Instances for Cost-Effective Autoscaling in the Public Cloud Ata Fatahi , Timothy Zhu, Bhuvan Urgaonkar 1 Problem and Motivation 1 day Wikipedia access trace Context: Autoscaling in the cloud Load Variability

715 views • 13 slides

What to use for the default supernova burst model(s)? K. Scholberg DAQ Physics Performance WG

What to use for the default supernova burst model(s)? K. Scholberg DAQ Physics Performance WG October 10, 2019 It would be nice to have a robust prediction of the mean expected signal , as well as the an expected range of signals , in

392 views • 11 slides

Bankrupt Covert Channel: Turning Network Predictability into Vulnerability Dmitrii Ustiugov ,

Bankrupt Covert Channel: Turning Network Predictability into Vulnerability Dmitrii Ustiugov , Plamen Petrov, Siavash Katebzadeh, Boris Grot University of Edinburgh This work is supported by ARM Center of Excellence at University of Edinburgh

564 views • 30 slides

Operating Systems Operating Systems CMPSC 473 CMPSC 473 CPU Scheduling CPU Scheduling

Operating Systems Operating Systems CMPSC 473 CMPSC 473 CPU Scheduling CPU Scheduling February 12, 2008 - Lecture February 12, 2008 - Lecture 8 8 Instructor: Trent Jaeger Instructor: Trent Jaeger Last class: Threads Today:

558 views • 29 slides