future scaling of processor memory interfaces
play

Future Scaling of Processor- Memory Interfaces Jung Ho Ahn , Norman - PowerPoint PPT Presentation

Future Scaling of Processor- Memory Interfaces Jung Ho Ahn , Norman P. Jouppi , Christos Kozyrakis , Jacob Leverich , Robert S. Schreiber HP Labs, Stanford University, Seoul National University Executive summary


  1. Future Scaling of Processor- Memory Interfaces Jung Ho Ahn †§ , Norman P. Jouppi † , Christos Kozyrakis ‡ , Jacob Leverich ‡ , Robert S. Schreiber † † HP Labs, ‡ Stanford University, § Seoul National University

  2. Executive summary performance reliability Challenges system-wide energy efficiency Holistic Main memory system assessments multithreaded/consolidated chipkill Multicore DIMM rank subsetting capacity vs. efficiency Solutions efficiency/latency/ throughput tradeoffs 2 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  3. Issues on DRAM based main memory • � Chip Multiprocessors (CMPs) demand � � High capacity � � High bandwidth • � Global wires improve slowly � � Energy efficiency challenges = DRAM power matters! � � Performance/power variation by access patterns • � Hard/soft errors 3 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  4. How DRAM works bank 7 bank 1 bank 0 Row decoder wl DRAM 16,384 r ows Memory array request Sense amplifier bit DRAM – 1T1C cell Column decoder data 8,192 columns 4 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  5. Performance/power variations 5 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  6. DIMM = Dual Inline Memory Module Overfetching problem � � DRAM row size = 8kb, 8 or 16 DRAMs per DIMM � � Cache line size = 512b � � Over 99% of bits are unused if row/col = 1 6 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  7. Solution = Multicore DIMM MCDIMM features � � VMD = Virtual Memory Device : rank subsetting � � Demux register � � Over 99% of bits are unused if row/col = 1 7 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  8. Demux register Demux Register (optional ) Counter Demultiplexer Register 8 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  9. Alternative solution = mini-rank MCDIMM vs. mini-rank � � Register for data path vs. control path � � Timing constraint due to access interference � � Load balancing between rank subsets 9 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  10. Governing equations Total main memory power = D · S · R · SP + E RW · BW RW + D · E AP · f AP � � D : # of DRAM chips per subset � � S : # of subsets per rank � � R : # of ranks per channel � � SP : static power of a DRAM chip � � E RW : energy needed to read/write a bit � � BW RW : read/write bandwidth per memory channel � � E AP : energy to activate/precharge a row � � f AP : frequency of activate/precharge per memory channel 10 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  11. Governing equations f AP f AP BW RW BW RW = � · f AP = · f CM = · f CM f CM CL CL � � BW RW : read/write bandwidth per memory channel � � f AP : frequency of activate/precharge per memory channel � � f CM : frequency of cache miss � � CL : line size of last-level cache � � � : row/col (bank conflict ratio) 11 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  12. � on multicore applications �������� � ��������� � ���������� � ���������� � � � ��� � ��� � ������������������ ��� � ��� � ��� � ��� � � � ��� � ��� � ��� � � � ������ � ������� � �������� � ��������� � ������� � � � �� � �������� � �������������� � 12 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  13. MCDIMM reliability issues • � SECDED � � Single error correction, double error detection � � Typically, (64 + 8) ECC solution is enough. • � SCCDCD � � Single chip-error correction, double chip-error detection � � Chipkill � � Implementations • � Interleaving SECDED over multiple ranks • � Employing stronger error correcting code � � 2b + l additional bits to correct b bits of bursty error + to detect l bits of bursty error 13 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  14. Multicore configuration DIMM MT Core MT Core L1$ L1$ MC MC Dir Dir L2$ L2$ MT Core MT Core L1$ L1$ MC MC Dir Dir L2$ L2$ 14 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  15. Experimental setup • � System architecture � � 32nm, 2GHz in-order CMT, max IPC = 16, 64 threads � � 64B $ line, 4 1MB L2 $ � � hierarchical MESI, reverse directory • � Simulator/modeling � � Intel Pin based in-house simulator � � CACTI • � Applications � � SPLASH-2/PARSEC/SPEC2006 15 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  16. Energy-delay product & system power with 1 rank per memory channel 16 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  17. Energy-delay product & system power with 1 rank per memory channel SPLASH-2 � SPEC CPU 2006 � PARSEC � 17 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  18. Energy-delay product & system power with 4 ranks per memory channel 18 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  19. Energy-delay product & system power with 4 ranks per memory channel SPLASH-2 � SPEC CPU 2006 � PARSEC � 19 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  20. Energy-delay product & system power with Chipkill enabled 20 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  21. Energy-delay product & system power with Chipkill enabled SPLASH-2 � SPEC CPU 2006 � PARSEC � 21 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

  22. Conclusion Challenges on main memory systems � � Performance/capacity demands � � Energy-efficiency goals � � Reliability constraints Multicore DIMM � � Instantiation of rank subsetting � � Gain energy efficiency & concurrency � � Sacrifice serialization latency � � Advantage in EDP (energy-delay product) with proper subsetting � � Energy-efficient, capacity-inefficient reliability solution 22 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend