power use of disk subsystems in supercomputers
play

Power!Use!of!Disk!Subsystems!in! Supercomputers! Ma8hew!L.!Curry! - PowerPoint PPT Presentation

Power!Use!of!Disk!Subsystems!in! Supercomputers! Ma8hew!L.!Curry! Sandia!Na?onal!Laboratories! Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energys


  1. Power!Use!of!Disk!Subsystems!in! Supercomputers! Ma8hew!L.!Curry! Sandia!Na?onal!Laboratories! Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. ! !

  2. Supercomputer!Power!Use!and! Exascale! • DOE!Plan:!First!exascale!machine!will! consume!up!to!20!MW,!or!50!GF/W! • The!June!2011!Green500!list!has!a!BG/Q! prototype!as!the!most!efficient!machine! – 2!GF/W! – In!the!next!decade,!machines!need!to!be!25x! more!power!efficient!! • Where!can!we!find!more!power!efficiency?!

  3. Memory!Hierarchy!and!Power! • The!first!reac?on! is!oXen!to!look!at! Reg! which!opera?ons! require!the!most! On_Chip!Cache! power! (0.1!nJ/byte)! • Disks!are!far!away! and!(most)!have! Off_Chip!Cache!(1!nJ/ moving!parts! byte)! • How!much!power! DRAM!(0.1!nJ/byte)! does!storage!really! use!for!real! applica?on! Disk!(1000!nJ/byte)! behavior?!

  4. A!Study!of!Power!in!Supercompu?ng! • Survey!three!sites!with!large!machines! – Los!Alamos:!Roadrunner,!#10,!and!others! – Los!Alamos/Sandia!ACES:!Cielo,!#6! – Sandia:!Red!Sky,!#16! – Clemson!University’s!Palme8o,!#96! • Asked!for!power!data!from!compute!and!I/O! infrastructure!separately! – No!cooling,!external!infrastructure,!etc.!Just! compute,!I/O!servers,!disks.!

  5. Los!Alamos!Descrip?on! • Two!separate!methods!of!sampling! – Cielo!individually! • 4.7_6.7!MW! • 1.1!PF!(~143k!cores)! • 10PB!of!dedicated!Panasas!storage! – Secure!Compu?ng!Environment,!which!includes! Cielo,!Roadrunner,!capacity!clusters,!etc.! • 16.5!MW!typical! • 3.5!PF! • 20!PB!of!Panasas!storage,!with!10PB!served!to!all! machines!except!Cielo!via!a!10GigE!fabric!

  6. Los!Alamos!Results! 100% 99% 98% Disks + Storage 97% 96% Servers + SAN 95% 94% Compute + I/O 93% Forwarding Cielo LANL Nodes Secure Computing Environment

  7. Sandia!Descrip?on! • Red!Sky/Red!Mesa!is!the!premier!capacity! plajorm!for!Sandia!and!NREL! – 3!PB! – 433.5!PF!(~42k!cores)! • One!rack!of!storage!and!compute!measured! throughout!a!single!day! • Extrapolated!to!unclassified!sec?on!of!Red! Sky,!which!is!approximately!56%!of!the!Red! Sky/Red!Mesa!machine!

  8. Sandia!Results! 100% 99% 98% 97% Disks 96% 95% Compute + 94% Storage 93% Servers 92% Hour 0 Hour 1 Hour 8 Hour 11 Hour 13 Hour 14 Hour 7

  9. Clemson!Descrip?on! • Capacity,!condominium!cluster!at!Clemson! University! – 92TF,!~14k!cores! – 616TB! • Data!collec?on!at!two_hour!intervals!over! two!weeks! – Storage!infrastructure!used!mostly!constant! power!throughout!

  10. Clemson!Results! 100% 99% 98% 97% Disks + 96% 95% Storage 94% Servers 93% Compute 92% 91% 90% Hour 0 Hour 20 Hour 40 Hour 60 Hour 80 Hour 100 Hour 120 Hour 140 Hour 160

  11. Extrapola?ng!to!Exascale! • Exascale!storage!systems!will!require! 320PB_1EB!of!storage!at!106.7!TB/s! – 32PB!main!memory! – Checkpoint!every!hour! – 95%!(57/60!minutes)!must!be!spent!compu?ng! • Predic?ons!for!future!disks!(~30TB!capacity,! ~380!MB/s!bandwidth)!dictate!277k!disks!! – 66%!of!power!budget!if!power!per!disk!remains! constant!

  12. Burst!Buffer! • Grider!has!detailed!in!many!presenta?ons!a! “burst!buffer”!idea!for!checkpoin?ng! – Quickly!accept!a!checkpoint!in!smaller!flash!store! – Bleed!flash!to!slower!disk_based!storage!between! checkpoints! • It!has!been!shown!that!this!will!work!from!a! purchase!price!standpoint! – Power?!

  13. Flash!Characteris?cs! • Current!flash!(e.g.,!Intel!320!series)!can!accept! 1MB/s!per!gigabyte!of!capacity! – Even!today,!90PB!of!flash!(to!hold!three!checkpoints)! is!sufficient!to!sustain!90TB/s!of!bandwidth! • Use!10TB/s!disk_based!store! – Requires!25k!disks,!which!may!hold!738!PB! – Extrapola?ng!from!today’s!disk!power,!this!is!6%!of! the!power!budget! – Flash!uses!a!comparable!amount!of!power,!yielding! 6.6%!of!20MW!for!disk!and!flash!

  14. Conclusion! • I/O!consumes!a!low!propor?on!of!power!within! the!machine! – 4.4_5.5%! • One!exascale!storage!model,!the!burst_buffer! scheme,!can!be!done!with!6.6%!of!the!power! budget! • Inefficiencies!in!the!power!feed!systems!of!the! data!center!can!be!a!larger!consumer!of!power!! • We!should!always!be!on!the!lookout!for!ways!to! be!more!efficient! – Especially!for!workloads!that! aren’t !checkpoin?ng!

  15. Acknowledgements! • Authors! – Lee!Ward,!Sandia! – Gary!Grider,!Los!Alamos! – Jill!Gemmill,!Clemson! – Jay!Harris,!Clemson! – Dave!Mar?nez,!Sandia!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend