AGENDA Just a quick overview of what DRAM is, how it works, and - PDF document

WHAT GRAPHICS PROGRAMERS NEED TO KNOW ABOUT DRAM ERIK BRUNVAND, NILADRISH CHATTERJEE, DANIEL KOPTA AGENDA • Just a quick overview of what DRAM is, how it works, and what you should know about it as a programmer. • A look at the circuits so you get some insight about why DRAM is so weird • A look at the DIMMs so you can see how that weirdness manifests in real memory sticks • A peek behind the scenes at the memory controller

MEMORY SYSTEM POWER • Memory power has   caught up to CPU   power! • This is for general-   purpose applications • Even worse for   memory-bound   applications like   graphics… P. Bose, IBM, from WETI keynote, 2012 MEMORY HIERARCHY GENERAL PURPOSE PROCESSOR � CPU � L2/L3 Cache � To off-chip   L1 Register Cache File Memory PROCESSOR DIE

MEMORY HIERARCHY GENERAL PURPOSE PROCESSOR � DRAM Memory � CPU � L2/L3 Cache � L1 Register Memory Cache File Controller PROCESSOR DIE MEMORY HIERARCHY GENERAL PURPOSE PROCESSOR

MEMORY SYSTEM STRUCTURE .. .. .. .. .. .. PROC • 4 DDR3 channels • 64-bit data channels .. .. • 800 MHz channels • 1-2 DIMMs/channel • 1-4 ranks/channel 3 MEMORY SYSTEM STRUCTURE .. .. SMB PROC .. .. • The link into the processor is narrow and high frequency • The ¡Scalable ¡Memory ¡Buffer ¡chip ¡is ¡a ¡“router” ¡that ¡connects to multiple DDR3 channels (wide and slow) • Boosts processor pin bandwidth and memory capacity • More expensive, high power 4

CPU DIE PHOTOS Intel Haswell Intel Sandy Bridge MEMORY HIERARCHY GRAPHICS PROCESSOR

GRAPHICS PROCESSOR DIE PHOTOS INTEL XEON PHI

LOOKING AHEAD… • DRAM latency and power have a large impact on system • Even when cache hit rates are high! • DRAMs are odd and complex beasts • Knowing something about their behavior can aid optimization • Sometimes you get better results even when the data bandwidth increases! DRAM CHIP   ORGANIZATION

DRAM:   DYNAMIC RANDOM ACCESS MEMORY • Designed for density   (memory size), not speed • The quest for smaller   and smaller bits means   huge complication   for the circuits • And complicated   read/write protocols SEMICONDUCTOR MEMORY BASICS STATIC MEMORY • “Static” memory uses   feedback to store 1/0   data 0 1 • Data is retained as   long as power is   maintained

SEMICONDUCTOR MEMORY BASICS STATIC MEMORY 0 1 • “Static” memory uses   feedback to store 1/0   data • Data is retained as   long as power is   Access maintained Control Six transistors per bit SEMICONDUCTOR MEMORY BASICS STATIC MEMORY Access Control 0 1 • “Static” memory uses   feedback to store 1/0   data • Data is retained as   long as power is   maintained Six transistors per bit

SRAM CHIP ORGANIZATION A11 • Simple array of bit cells A8 V CC A9 Row Memory array A7 V SS decoder 256 × 256 • This example is tiny - 64k (8k x 8) A12 A5 A6 A4 • Bigger examples might have multiple arrays I/O1 Column I/O Input Column decoder data control I/O8 A1 A2 A0 A10 A3 CS2 Timing pulse generator CS1 Read, Write control WE OE SRAM CHIP ORGANIZATION • Simple access strategy • Apply address, wait, data appears on data lines (or gets written) • CS is “chip select”   OE is “output enable”   WE is “write enable” • SRAM is what’s used in   on-chip caches • Also for embedded systems

SRAM CHIP ORGANIZATION Function Table • Simple access strategy WE CS1 CS2 OE Mode V CC current I/O pin Ref. cycle H Not selected (power down) I SB , I SB1 High-Z — × × × • Apply address, wait,   L Not selected (power down) I SB , I SB1 High-Z — × × × H L H H Output disable I CC High-Z — data appears on data lines   H L H L Read I CC Dout Read cycle (1)–(3) (or gets written) L L H H Write I CC Din Write cycle (1) L L H L Write I CC Din Write cycle (2) • CS is “chip select”   Note: × : H or L t RC OE is “output enable”   Address Valid address WE is “write enable” t AA t CO1 CS1 • SRAM is what’s used in   t LZ1 t CO2 t HZ1 on-chip caches CS2 t LZ2 t OE t HZ2 • Also for embedded systems t OLZ OE t OHZ High Impedance Dout Valid data t OH SEMICONDUCTOR MEMORY BASICS DYNAMIC MEMORY 1/0 • Data is stored as charge   on a capacitor • Access transistor allows   charge to be added or   removed from the capacitor One transistor per bit

DYNAMIC MEMORY PHOTOMICROGRAPHS http://www.tf.uni-kiel.de/ www.sdram-technology.info SEMICONDUCTOR MEMORY BASICS DYNAMIC MEMORY 1/0 • Writing to the bit Write   Driver • Data from the driver circuit   dumps charge on capacitor   1/0 or removes charge from   capacitor

SEMICONDUCTOR MEMORY BASICS DYNAMIC MEMORY 1/0 • Reading from the bit • Data from capacitor is coupled   to the bit line 1/0 • Voltage change is sensed   by the sense amplifier Sense   Amplifier • Note - reading is destructive! • Charge is removed from   capacitor during read DRAM ARRAY (MAT) • An entire row is first transferred Row Decoder to/from the Row Buffer Row • e.g. 16Mb array (4096x4096) Address • Row and Column = 12-bit addr • Row buffer = 4096b wide Sense Ampli fj ers • One column is then selected from that buffer Row Bu fg er Column • Note that rows and columns Column Decoder Address are addressed separately Data

DRAM ARRAY (MAT) • DRAM arrays are very dense Row Decoder • But also very slow! Row Address • ~20ns to return data that is already in the Row Buffer • ~40ns to read new data into Sense Ampli fj ers a Row Buffer (precharge…) Row Bu fg er • Another ~20ns if you have Column Column Decoder to write Row Buffer back Address Data first (Row Buffer Conflict) DRAM ARRAY (MAT) • Another issue: refresh Row Decoder • The tiny little capacitors Row “leak” into the substrate Address • So, even if you don’t read a row, you have to refresh it Sense Ampli fj ers every so often Row Bu fg er • Typically every 64ms Column Column Decoder Address Data

DRAM INTERNAL MAT ORGANIZATION Row Decoder Row Decoder Row Decoder Row Row Row DRAM Array DRAM Array DRAM Array Address Address Address Sense Ampli fj ers Sense Ampli fj ers Sense Ampli fj ers Row Bu fg er Row Bu fg er Row Bu fg er Column Column Column Column Decoder Address Column Decoder Column Decoder Address Address Data Data Data X2 X4 X8 x16, x32, etc. DRAM CHIP   • This is an x4 2Gb ORGANIZATION DRAM (512Mx4) • 8 x 256kb banks • Each multiple mats • “8n prefetch” • fetches 8x4 = 32 bits from the row buffer on each access • 8kb row buffer

DRAM INTERNAL MAT ORGANIZATION DRAM COMMAND   STATE MACHINE • Access commands/protocols are a little more complex than for SRAM… • Activate, Precharge, RAS, CAS • If open row, then just CAS • If wrong open row then   write-back, Act, Pre, RAS, CAS • Lots of timing relationships! • This is what the memory controller keeps track of… • Micron DRAM datasheet is 211 pages…

• Activate uses the row DRAM TIMING address (RAS) and bank address to activate and precharge a row • Read gives the column address (CAS) to select bits from the row buffer • Note burst of 8 words returned • Note data returned on both edges of clock (DDR) DRAM PACKAGES

HIGHER LEVEL ORGANIZATION DIMM, RANK, BANK, AND ROW BUFFER Bank Processor Row Buffer Memory   Controller Address and Data Bus • Bank - a set of array that are active on each request • Row Buffer: The last row read from the Bank • Typically on the order of 8kB (for each 64bit read request!) • Acts like a secret cache!!!

  DRAM CHIP SUMMARY • DRAM is designed to be as dense as   possible • Implications: slow and complex • Most interesting behavior: The Row Buffer • Significant over-fetch - 8kB fetched internally for a 64bit bus request • Data delivered from an “open row” is significantly faster, and lower energy, than truly random data • This “secret cache” is the key to tweaking better performance out of DRAM! DRAM DIMM AND MEMORY CONTROLLER ORGANIZATION • Niladrish Chatterjee   NVIDIA Corporation

DRAM DIMM AND ACCESS PIPELINE • DIMMs are small PCBs on …" which DRAM chips are Array' 1/8 th 'of'the' assembled row'buffer' • Chips are separated into One'word'of' ranks data'output' • A rank is a collection of DRAM' DIMM' chips that work in unison chip'or' Rank' Bank' device' to service a memory request • There are typically 2 or 4 ranks on a DIMM Memory'bus'or'channel' Memory" Controller" DRAM DIMM AND ACCESS PIPELINE • The memory channel has …" data lines and a command/ Array' address bus 1/8 th 'of'the' row'buffer' • Data channel width is One'word'of' typically 64 (e.g. DDR3) data'output' • DRAM chips are typically x4, DRAM' DIMM' x8, x16 (bits/chip) chip'or' Rank' Bank' device' • 64bit data channel ==   sixteen x4 chips   or eight x8 chips   or four x16 chips   Memory'bus'or'channel' or two x32 chips… Memory" Controller"

AGENDA Just a quick overview of what DRAM is, how it works, and - PDF document

WHAT GRAPHICS PROGRAMERS NEED TO KNOW ABOUT DRAM ERIK BRUNVAND, NILADRISH CHATTERJEE, DANIEL KOPTA AGENDA Just a quick overview of what DRAM is, how it works, and what you should know about it as a programmer. A look at the circuits so

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for

Negotiating Conflicts Eff Effectively ti l Agenda Agenda Agenda Agenda Introductions

Katie Dively, Research Scientist II Agenda Agenda Agenda Agenda Welcome! 7 Step

THE BLACK ART OF BINARY HIJACKING HIJACKING Agenda Agenda Agenda Agenda 2 2 Overview of

Community Advisory Group Meeting June 20, 2016 Agenda 1. Welcome, Introductions and Agenda

Anaheim August 27, 2008 Agenda Agenda Agenda Introduction New Rule Requirements

Investor Report 2019 Earning Result 2 nd March 2020 AGENDA ITEM 01 FY2019 Performance AGENDA

Capital markets day 27 th September 2017 Agenda Time Agenda item Led by Time Agenda item

March 17, 2010 PURPOSE and AGENDA PURPOSE and AGENDA This meeting is a part of the NEPA/CEPA

MOBILITY RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2014 AGENDA AGENDA FINANCIAL

R E B I R T H R E B I R T H 1 Meeting Agenda Meeting Agenda Agenda 1

Todays Agenda Todays Agenda Continued Todays Agenda Continued Save the Date August

Web E Web E ngineer ngineer ing Pr ing Pr oc ess oc ess We e k 2 Agenda (Lecture) Agenda

F F unctional Design unctional Design We e k 9 Agenda (Lecture) Agenda (Lecture)

IDN BOF Agenda Harald Alvestrand, chair Agenda - 1 0900: Agenda bash, blue sheet, scribe ! 0910:

Agenda Agenda Linda Rammler, UConn UCEDD (copy from Agenda handout) Fr. John Gallagher,

Chap 9:Arrange Networks Paper: Topological Fisheye Networks Tamara Munzner Department of

Cat Herding and Scat Shoveling Supervisor Survival Strategies Who are we Who: Tom Moffatt

Simplicity Matters Rich Hickey Simplicity is prerequisite for reliability Edsger W. Dijkstra

HOMEWORK CENTRE MANOORA Each Wednesday , the Manoora Community Centre will host the Homework

Analytics Building Blocks Duen Horng (Polo) Chau Associate Professor, College of Computing

(1) Communication graphs (2) Tools that offload to GPUs Discussion during the tools meeting Ask

Overview of Complex Networks Principles of Complex Systems Basic definitions Examples of

N328 Visualizing Information Week 12: Networks & Trees Khairi Reda | redak@iu.edu School of

AGENDA Just a quick overview of what DRAM is, how it works, and - PDF document

WHAT GRAPHICS PROGRAMERS NEED TO KNOW ABOUT DRAM ERIK BRUNVAND, NILADRISH CHATTERJEE, DANIEL KOPTA AGENDA Just a quick overview of what DRAM is, how it works, and what you should know about it as a programmer. A look at the circuits so

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for

Negotiating Conflicts Eff Effectively ti l Agenda Agenda Agenda Agenda Introductions

Katie Dively, Research Scientist II Agenda Agenda Agenda Agenda Welcome! 7 Step

THE BLACK ART OF BINARY HIJACKING HIJACKING Agenda Agenda Agenda Agenda 2 2 Overview of

Community Advisory Group Meeting June 20, 2016 Agenda 1. Welcome, Introductions and Agenda

Anaheim August 27, 2008 Agenda Agenda Agenda Introduction New Rule Requirements

Investor Report 2019 Earning Result 2 nd March 2020 AGENDA ITEM 01 FY2019 Performance AGENDA

Capital markets day 27 th September 2017 Agenda Time Agenda item Led by Time Agenda item

March 17, 2010 PURPOSE and AGENDA PURPOSE and AGENDA This meeting is a part of the NEPA/CEPA

MOBILITY RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2014 AGENDA AGENDA FINANCIAL

R E B I R T H R E B I R T H 1 Meeting Agenda Meeting Agenda Agenda 1

Todays Agenda Todays Agenda Continued Todays Agenda Continued Save the Date August

Web E Web E ngineer ngineer ing Pr ing Pr oc ess oc ess We e k 2 Agenda (Lecture) Agenda

F F unctional Design unctional Design We e k 9 Agenda (Lecture) Agenda (Lecture)

IDN BOF Agenda Harald Alvestrand, chair Agenda - 1 0900: Agenda bash, blue sheet, scribe ! 0910:

Agenda Agenda Linda Rammler, UConn UCEDD (copy from Agenda handout) Fr. John Gallagher,

Chap 9:Arrange Networks Paper: Topological Fisheye Networks Tamara Munzner Department of

Cat Herding and Scat Shoveling Supervisor Survival Strategies Who are we Who: Tom Moffatt

Simplicity Matters Rich Hickey Simplicity is prerequisite for reliability Edsger W. Dijkstra

HOMEWORK CENTRE MANOORA Each Wednesday , the Manoora Community Centre will host the Homework

Analytics Building Blocks Duen Horng (Polo) Chau Associate Professor, College of Computing

(1) Communication graphs (2) Tools that offload to GPUs Discussion during the tools meeting Ask

Overview of Complex Networks Principles of Complex Systems Basic definitions Examples of

N328 Visualizing Information Week 12: Networks &amp; Trees Khairi Reda | redak@iu.edu School of

N328 Visualizing Information Week 12: Networks & Trees Khairi Reda | redak@iu.edu School of