2010 blue waters performance modeling workshop opening
play

2010 Blue Waters Performance Modeling Workshop Opening and - PowerPoint PPT Presentation

2010 Blue Waters Performance Modeling Workshop Opening and Introduction Torsten Hoefler With slides from: William Kramer, Marc Snir, William Gropp, IBM, and the Blue Waters team 1 Introduction and Overview My slides contain only public


  1. 2010 Blue Waters Performance Modeling Workshop – Opening and Introduction Torsten Hoefler With slides from: William Kramer, Marc Snir, William Gropp, IBM, and the Blue Waters team 1

  2. Introduction and Overview • My slides contain only public information and will be available online after the workshop • No need to take pictures or notes! • Parts of tomorrow will contain IBM confidential information • You may only attend the NDA session if your institution signed and cleared all NDAs for you! • You are responsible to maintain the confidentiality of the information! 2

  3. Blue Waters in a Nutshell • >300.000 compute cores • based on Power7 • 10 PF/s peak • 1 PF/s sustained • >1 PiB RAM • >10 PiB disk storage • >0.5 EiB archival storage 3

  4. Performance Modeling for Blue Waters • Most users have only experience at comparatively “small” scale (<8000 cores) • Applications should be ready to run on the full system • Needs a clear understanding before system is deployed (run, tweak, rerun loop not possible)  Programmers need to develop a deep understanding of the application scaling and bottlenecks at scale by performance modeling! 4

  5. From Chip to Entire Integrated System NPCF Blue Waters System Building Block SuperNode (1024 cores) Super Node (32 Nodes / 4 CEC) L-Link Cables Near-line Storage Drawer (256 cores) On-line Storage SMP node (32 cores) P7 Chip (8 cores) 5

  6. 6

  7. Power7 Chip (8 cores) • Base Technology • 45 nm, 576 mm2 • 1.2 B transistors • Chip • 8 cores • 4 FMAs/cycle/core • 32 MB L3 (private/shared) • Dual DDR3 memory • 128 GiB/s peak bandwidth • (1/2 byte/flop) • Clock range of 3.5 – 4 GHz Quad-chip MCM 7

  8. L3 Cache/On-Chip Communication • L1 32KB Instruction / core • L1 32KB Data / core • L2 = 256KB / core • L3 = 4MB eDRAM / core • Fast private and shared region 8

  9. Quad Chip Module (4 chips) A Clk Grp A Clk Grp 8c uP 8c uP B Clk Grp B Clk Grp MC 0 MC 0 MC0 MC 0 C Clk Grp C Clk Grp P7-0 P7-1 D Clk Grp D Clk Grp A Clk Grp A Clk Grp • B A 32 cores! B Clk Grp B Clk Grp MC 1 MC 1 MC1 MC1 Y X C Clk Grp C Clk Grp • 32 cores*8 F/core*4 GHz = 1 TF C C D Clk Grp D Clk Grp Y Z W • Z 4 threads per core (max) A X B W • 4x32 MiB L3 cache B W A X Z W • C A Clk Grp C 512 GB/s RAM BW (0.5 B/F) A Clk Grp B Clk Grp B Clk Grp Y MC 0 MC 0 Z MC0 MC0 • C Clk Grp C Clk Grp A B 800 W (0.8 W/F) D Clk Grp D Clk Grp X Y • Flat shared memory! A Clk Grp A Clk Grp P7-3 P7-2 B Clk Grp B Clk Grp MC 1 MC 1 MC1 MC1 C Clk Grp C Clk Grp D Clk Grp D Clk Grp 8c uP 8c uP 9

  10. Adding a Network Interface (Torrent) • Connects QCM to PCI-e A Clk Grp A Clk Grp DIMM 5 DIMM 1 DIMM 1 Mem Mem 8c uP 8c uP Mem Mem DIMM 5 B Clk Grp B Clk Grp MC 0 MC 0 MC0 MC 0 Mem Mem Mem Mem C Clk Grp C Clk Grp Mem Mem DIMM 4 • DIMM 0 DIMM 0 Mem P7-0 P7-1 Mem DIMM 4 (two 16x and one 8x PCI-e slot) D Clk Grp D Clk Grp Mem Mem Mem Mem A Clk Grp A Clk Grp DIMM 12 DIMM 7 DIMM 7 Mem DIMM 12 Mem Mem Mem B A B Clk Grp B Clk Grp • Mem MC1 MC 1 MC 1 MC1 Mem Mem Mem Connects 8 QCM's via low latency, Y X C Clk Grp C Clk Grp DIMM 13 DIMM 13 DIMM 6 Mem Mem Mem Mem DIMM 6 C C D Clk Grp D Clk Grp high bandwidth, copper fabric. Mem Mem Mem Mem Z Y W Z A X B W • Provides a message passing B W A X Z W mechanism with very C A Clk Grp A Clk Grp C DIMM 10 DIMM 10 DIMM 14 DIMM 14 Mem Mem Mem Mem B Clk Grp high bandwidth Y B Clk Grp Mem MC 0 MC 0 Mem Z Mem Mem MC0 MC0 C Clk Grp C Clk Grp A B DIMM 11 DIMM 11 DIMM 15 DIMM 15 Mem Mem Mem Mem D Clk Grp D Clk Grp X Y Mem Mem Mem Mem • Provides the lowest possible A Clk Grp A Clk Grp DIMM 3 DIMM 3 Mem Mem Mem DIMM 8 Mem DIMM 8 P7-3 P7-2 B Clk Grp B Clk Grp Mem Mem MC 1 MC 1 Mem latency between 8 QCM's Mem MC1 MC1 C Clk Grp C Clk Grp DIMM 2 Mem Mem Mem Mem DIMM 9 DIMM 2 DIMM 9 D Clk Grp D Clk Grp Mem 8c uP 8c uP Mem Mem Mem Hub Chip Module 28x XMIT/RCV pairs 624 @ 10 Gb/s 22+22GB/s 22+22GB/s 22+22GB/s 22+22GB/s 22+22GB/s 22+22GB/s 22+22GB/s 12x 12x 832 7+7GB/s 7+7GB/s 7+7GB/s 12x 12x 12x 12x 12x 12x Z 12x 12x Y 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 164 164 164 164 164 164 164 72 72 72 W 12x 12x 12x 12x 12x 12x X Ll0 Ll1 Ll2 Ll3 Ll4 Ll5 Ll6 EG2 EG1 EG2 D0-D15 Lr0-Lr23 10+10GB/s (12x=10+2) 5+5GB/s (6x=5+1) PCIe 61x PCIe 16x PCIe 8x 7 Inter-Hub Board Level L-Buses 3.0Gb/s @ 8B+8B, 90% sus. peak 320 GB/s 240 GB/s 10

  11. 1.1 TB/s HUB TPMD-A, TMPD-B Hot Plug Ctl Hot Plug Ctl Hot Plug Ctl SEEPROM 1 SEEPROM 2 PX0 Bus PX1 Bus PX2 Bus FSP1-A FSP1-B MDC-A MDC-B • 192 GB/s Host Connection SVIC SVIC 16x 16x 16x 16x FSI FSI I2C I2C I2C 8x 8x • 336 GB/s to 7 other local nodes IO PHY IO PHY IO PHY PCI-E PCI-E PCI-E I2C_0 + Int LL0 Bus 8B To Optical Modules Copper 8B I2C I2C • 28 HUB To HUB Copper Board Wiring 240 GB/s to local-remote nodes LL1 Bus 8B Copper 8B I2C_27 + Int LL2 Bus 8B Copper 8B EI-3 PHYs Diff PHYs • L local 320 GB/s to remote nodes LL3 Bus Torrent 8B 12x D0 Bus Interconnect of Supernodes Copper 8B 12x Optical LL4 Bus 8B Copper 8B D Buses • Optical 16 40 GB/s to general purpose I/O D Bus LL5 Bus 8B Copper 8B LL6 Bus 8B 12x D15 Bus Copper 8B Diff PHYs 12x Optical 8B W-Bus 8B W-Bus TOD Sync 8B X-Bus TOD Sync 8B Y-Bus TOD Sync 8B Z-Bus TOD Sync 8B X-Bus 8B Y-Bus 8B Z-Bus 6x 6x 6x 6x 24 L remote LR23 Bus LR0 Bus HUB to QCM Connections Optical Optical Buses Address/Data L remote 4 Drawer Interconnect to Create a Supernode Optical 11

  12.  256 Cores  HUB to HUB Copper Wiring  L-Local First Level Interconnect • Drawer 8 nodes • • 256 cores 32 chips 64/40 Optical N0-DIMM07 N0-DIMM15 N0-DIMM06 N0-DIMM14 P7-1 'D-Link' HUB N0-DIMM05 U-P1-M1 N0-DIMM13 P7-0 QCM 0 N0-DIMM04 N0-DIMM12 P7-2 0 N0-DIMM03 N0-DIMM11 P7-3 N0-DIMM02 N0-DIMM10 N0-DIMM01 N0-DIMM09 N0-DIMM00 N0-DIMM08 N1-DIMM07 N1-DIMM15 N1-DIMM06 N1-DIMM14 P7-1 HUB U-P1-M2 N1-DIMM05 N1-DIMM13 P1-C17-C1 17 QCM 1 e C P P7-0 I N1-DIMM04 P7-2 N1-DIMM12 1 N1-DIMM03 N1-DIMM11 P1-C16-C1 P7-3 16 C P e N1-DIMM02 N1-DIMM10 I N1-DIMM01 N1-DIMM09 P1-C15-C1 15 N1-DIMM00 N1-DIMM08 e C P I N2-DIMM07 N2-DIMM15 P1-C14-C1 14 C P N2-DIMM06 N2-DIMM14 e I P7-1 DCA-1 Connector (Bottom DCA) HUB N2-DIMM05 U-P1-M3 N2-DIMM13 QCM 2 P7-0 P1-C13-C1 13 N2-DIMM04 P7-2 N2-DIMM12 e C P DCA-0 Connector (Top DCA) 2 I N2-DIMM03 N2-DIMM11 P7-3 N2-DIMM02 N2-DIMM10 P1-C12-C1 12 C P e I N2-DIMM01 N2-DIMM09 N2-DIMM00 N2-DIMM08 P1-C11-C1 11 e C P I N3-DIMM07 N3-DIMM15 N3-DIMM06 N3-DIMM14 P7-1 P1-C10-C1 10 C P HUB e N3-DIMM05 U-P1-M4 N3-DIMM13 I P7-0 QCM 3 N3-DIMM04 N3-DIMM12 P7-2 3 P1-C9-C1 N3-DIMM03 N3-DIMM11 9 e C P I P7-3 N3-DIMM02 N3-DIMM10 Optical Fan-out from N3-DIMM01 N3-DIMM09 N3-DIMM00 N3-DIMM08 HUB Modules N4-DIMM07 N4-DIMM15 N4-DIMM06 N4-DIMM14 2,304 Fiber 'L-Link' P7-1 HUB N4-DIMM05 U-P1-M5 N4-DIMM13 P7-0 QCM 4 P1-C8-C1 C N4-DIMM04 N4-DIMM12 8 e P P7-2 I 4 N4-DIMM03 N4-DIMM11 P7-3 N4-DIMM02 N4-DIMM10 P1-C7-C1 C P 7 e I N4-DIMM01 N4-DIMM09 N4-DIMM00 N4-DIMM08 P1-C6-C1 6 e C P I N5-DIMM07 N5-DIMM15 N5-DIMM06 N5-DIMM14 P1-C5-C1 P7-1 C P 5 e HUB I N5-DIMM05 U-P1-M6 N5-DIMM13 QCM 5 P7-0 N5-DIMM04 P7-2 N5-DIMM12 5 P1-C4-C1 4 e C P N5-DIMM03 N5-DIMM11 I P7-3 N5-DIMM02 N5-DIMM10 P1-C3-C1 N5-DIMM01 N5-DIMM09 C P 3 e I N5-DIMM00 N5-DIMM08 P1-C2-C1 N6-DIMM07 N6-DIMM15 2 e C P I N6-DIMM06 N6-DIMM14 P7-1 HUB N6-DIMM05 U-P1-M7 N6-DIMM13 P1-C1-C1 C P QCM 6 1 e P7-0 I N6-DIMM04 N6-DIMM12 P7-2 6 N6-DIMM03 N6-DIMM11 P7-3 N6-DIMM02 N6-DIMM10 64/40 Optical N6-DIMM01 N6-DIMM09 N6-DIMM00 N6-DIMM08 'D-Link' N7-DIMM07 N7-DIMM15 N7-DIMM06 N7-DIMM14 P7-1 HUB U-P1-M8 N7-DIMM05 N7-DIMM13 QCM 7 P7-0 N7-DIMM04 N7-DIMM12 P7-2 7 N7-DIMM03 N7-DIMM11 P7-3 N7-DIMM02 N7-DIMM10 N7-DIMM01 N7-DIMM09 N7-DIMM00 N7-DIMM08 12

  13. 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend