Future Scaling of Processor- Memory Interfaces Jung Ho Ahn , Norman - - PowerPoint PPT Presentation

future scaling of processor memory interfaces
SMART_READER_LITE
LIVE PREVIEW

Future Scaling of Processor- Memory Interfaces Jung Ho Ahn , Norman - - PowerPoint PPT Presentation

Future Scaling of Processor- Memory Interfaces Jung Ho Ahn , Norman P. Jouppi , Christos Kozyrakis , Jacob Leverich , Robert S. Schreiber HP Labs, Stanford University, Seoul National University Executive summary


slide-1
SLIDE 1

Future Scaling of Processor- Memory Interfaces

Jung Ho Ahn†§, Norman P. Jouppi †, Christos Kozyrakis‡, Jacob Leverich‡, Robert S. Schreiber†

†HP Labs, ‡Stanford University, §Seoul National University

slide-2
SLIDE 2

Nov 19, 2009 2

Executive summary memory system

Challenges Solutions Holistic assessments

performance reliability energy efficiency Multicore DIMM rank subsetting efficiency/latency/ throughput tradeoffs system-wide multithreaded/consolidated chipkill capacity vs. efficiency

Main

SC09 – Future Scaling of Processor-Memory Interfaces

slide-3
SLIDE 3

Issues on DRAM based main memory

  • Chip Multiprocessors (CMPs) demand

High capacity High bandwidth

  • Global wires improve slowly

Energy efficiency challenges = DRAM power matters! Performance/power variation by access patterns

  • Hard/soft errors

3 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-4
SLIDE 4

How DRAM works

4 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

wl bit

DRAM – 1T1C cell

bank 7 bank 1 request data Row decoder Sense amplifier Column decoder DRAM Memory array bank 0 16,384 r

  • ws

8,192 columns

slide-5
SLIDE 5

Performance/power variations

5 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-6
SLIDE 6

DIMM = Dual Inline Memory Module

6 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

Overfetching problem DRAM row size = 8kb, 8 or 16 DRAMs per DIMM Cache line size = 512b Over 99% of bits are unused if row/col = 1

slide-7
SLIDE 7

Solution = Multicore DIMM

7 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

MCDIMM features VMD = Virtual Memory Device : rank subsetting Demux register Over 99% of bits are unused if row/col = 1

slide-8
SLIDE 8

Demux register

8 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

Register Demultiplexer (optional ) Counter Demux Register

slide-9
SLIDE 9

Alternative solution = mini-rank

9 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

MCDIMM vs. mini-rank Register for data path vs. control path Timing constraint due to access interference Load balancing between rank subsets

slide-10
SLIDE 10

Governing equations

10 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

D : # of DRAM chips per subset S : # of subsets per rank R : # of ranks per channel SP : static power of a DRAM chip ERW : energy needed to read/write a bit BWRW : read/write bandwidth per memory channel EAP : energy to activate/precharge a row fAP : frequency of activate/precharge per memory channel

Total main memory power = D · S · R · SP + ERW · BWRW + D · EAP · fAP

slide-11
SLIDE 11

Governing equations

11 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

BWRW : read/write bandwidth per memory channel fAP : frequency of activate/precharge per memory channel fCM : frequency of cache miss CL : line size of last-level cache : row/col (bank conflict ratio)

fAP = · fCM = · = · fAP fCM fAP fCM BWRW CL BWRW CL

slide-12
SLIDE 12
  • n multicore applications

12 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-13
SLIDE 13

MCDIMM reliability issues

  • SECDED

Single error correction, double error detection Typically, (64 + 8) ECC solution is enough.

  • SCCDCD

Single chip-error correction, double chip-error detection Chipkill Implementations

  • Interleaving SECDED over multiple ranks
  • Employing stronger error correcting code

2b + l additional bits to correct b bits of bursty error + to detect l bits of bursty error

13 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-14
SLIDE 14

Multicore configuration

14 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

DIMM MT Core L1$ MT Core L1$ MT Core L1$ MC MC MC MC L2$ Dir Dir L2$ Dir L2$ L2$ Dir MT Core L1$

slide-15
SLIDE 15

Experimental setup

  • System architecture

32nm, 2GHz in-order CMT, max IPC = 16, 64 threads 64B $ line, 4 1MB L2 $ hierarchical MESI, reverse directory

  • Simulator/modeling

Intel Pin based in-house simulator CACTI

  • Applications

SPLASH-2/PARSEC/SPEC2006

15 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-16
SLIDE 16

Energy-delay product & system power with 1 rank per memory channel

16 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-17
SLIDE 17

Energy-delay product & system power with 1 rank per memory channel

17 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

SPLASH-2 SPEC CPU 2006 PARSEC

slide-18
SLIDE 18

Energy-delay product & system power with 4 ranks per memory channel

18 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-19
SLIDE 19

Energy-delay product & system power with 4 ranks per memory channel

19 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

SPLASH-2 SPEC CPU 2006 PARSEC

slide-20
SLIDE 20

Energy-delay product & system power with Chipkill enabled

20 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

slide-21
SLIDE 21

Energy-delay product & system power with Chipkill enabled

21 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

SPLASH-2 SPEC CPU 2006 PARSEC

slide-22
SLIDE 22

Nov 19, 2009 22

Conclusion

Multicore DIMM Instantiation of rank subsetting Gain energy efficiency & concurrency Sacrifice serialization latency Advantage in EDP (energy-delay product) with proper subsetting Energy-efficient, capacity-inefficient reliability solution Challenges on main memory systems Performance/capacity demands Energy-efficiency goals Reliability constraints

SC09 – Future Scaling of Processor-Memory Interfaces