Accelerating Science with the NERSC Burst Buffer Debbie Bard Big - PowerPoint PPT Presentation

Accelerating Science with the NERSC Burst Buffer Debbie Bard � Big Data Architect, � Data and Analytics Services � NERSC, LBL July ¡22, ¡2016 ¡ -‑ ¡1 ¡-‑ ¡

Outline • Future ¡compu3ng ¡architecture ¡ – The ¡New ¡Storage ¡Hierarchy ¡ • What ¡is ¡a ¡Burst ¡Buffer? ¡ ¡ – Architecture ¡and ¡so8ware ¡ • Users ¡are ¡excited ¡about ¡new ¡architectures! ¡ – Early ¡User ¡Program ¡ • Science ¡applica3ons ¡≠ ¡benchmarks ¡ – Real-‑world ¡performance ¡ • New ¡tech ¡teething ¡problems ¡ – Challenges ¡and ¡Lessons ¡Learned ¡ -‑ ¡2 ¡-‑ ¡

Our users are demanding… -‑ ¡3 ¡-‑ ¡ -‑ ¡3 ¡-‑ ¡

… and not just for more compute time! • Users ¡biggest ¡“ask” ¡(aKer ¡wan3ng ¡more ¡compute ¡ cycles) ¡is ¡for ¡beNer ¡IO ¡performance ¡ – Eg ¡scale ¡up ¡a ¡simulaEon ¡from ¡100k ¡cores ¡to ¡1M ¡cores ¡– ¡ 10x ¡more ¡compute ¡producing ¡10x ¡more ¡data ¡ per ¡%mestep . ¡ Need ¡10x ¡more ¡IO ¡BW! ¡ ¡ – Memory ¡can ¡be ¡the ¡largest ¡dollar ¡and ¡power ¡cost ¡in ¡an ¡ HPC ¡system ¡ • New ¡chip ¡architectures ¡(eg ¡Knight’s ¡Landing) ¡are ¡ very ¡energy ¡efficient ¡– ¡provide ¡the ¡required ¡ compute ¡for ¡less ¡power ¡ – But ¡to ¡use ¡them ¡well, ¡you ¡have ¡to ¡be ¡able ¡to ¡corral ¡your ¡ data ¡appropriately ¡ -‑ ¡4 ¡-‑ ¡

HPC memory hierarchy is changing Present ¡(Cori) ¡ Past ¡(Edison) ¡ CPU ¡ On ¡ ¡ CPU ¡ On ¡ ¡ Near ¡Memory ¡ ¡ Chip ¡ ¡ ¡ ¡ ¡(HBM) ¡ Chip ¡ Memory ¡ ¡ Far ¡Memory ¡ ¡ (DRAM) ¡ ¡ ¡ ¡(DRAM) ¡ Near ¡Storage ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡(SSD) ¡ Off ¡ ¡ Off ¡ ¡ Storage ¡ ¡ Chip ¡ Chip ¡ (HDD) ¡ Far ¡Storage ¡ ¡ ¡ ¡ ¡ ¡(HDD) ¡ -‑ ¡5 ¡-‑ ¡

HPC memory hierarchy is changing • Silicon ¡and ¡system ¡ integra1on ¡ • Bring ¡everything ¡– ¡ CPU ¡ On ¡ ¡ storage, ¡memory, ¡ Near ¡Memory ¡ ¡ interconnect ¡– ¡closer ¡to ¡ Chip ¡ ¡ ¡ ¡ ¡(HBM) ¡ the ¡cores ¡ Far ¡Memory ¡ ¡ ¡ ¡ ¡(DRAM) ¡ • Raise ¡center ¡of ¡gravity ¡of ¡ memory ¡pyramid, ¡and ¡ Near ¡Storage ¡ ¡ make ¡it ¡faNer ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡(SSD) ¡ Off ¡ ¡ – Enable ¡faster ¡and ¡more ¡ Chip ¡ Far ¡Storage ¡ ¡ efficient ¡data ¡movement ¡ ¡ ¡ ¡ ¡(HDD) ¡ -‑ ¡6 ¡-‑ ¡

• HDD ¡capacity/$ ¡ is ¡increasing ¡over ¡ 3me, ¡but ¡SSD ¡is ¡ catching ¡up ¡fast! ¡ • BW ¡and ¡IOPs ¡are ¡ flat ¡for ¡HDD ¡ 6TB ¡HDD ¡($300) ¡ 4TB ¡NVMe ¡SSD ¡ ($8000) ¡ Capacity ¡ 6TB, ¡~20GB/$ ¡ 4TB, ¡~0.5GB/$ ¡ BW ¡ 150MB/s, ¡~0.5MB/s/$ ¡ 3GB/s, ¡~0.4MB/s/$ ¡ IOPs ¡ 150/s, ¡~0.5/$ ¡ 200,000/s, ¡~25/$ ¡ -‑ ¡7 ¡-‑ ¡

• Spinning ¡disk ¡has ¡mechanical ¡ limita3on ¡in ¡how ¡fast ¡data ¡can ¡be ¡read ¡ from ¡disk ¡ – SSDs ¡do ¡not ¡have ¡the ¡physical ¡drive ¡ components ¡so ¡will ¡always ¡read ¡faster ¡ – Problem ¡exacerbated ¡for ¡small/random ¡ reads ¡ – But ¡for ¡large ¡files ¡striped ¡over ¡many ¡disks ¡ on ¡e.g. ¡Lustre, ¡HDD ¡sEll ¡performs ¡well. ¡ ¡ • SSDs ¡have ¡limited ¡RWs ¡– ¡the ¡memory ¡ cells ¡will ¡wear ¡out ¡over ¡3me ¡ – This ¡is ¡a ¡real ¡concern ¡for ¡a ¡data-‑intensive ¡ compuEng ¡center ¡like ¡NERSC. ¡ ¡ -‑ ¡8 ¡-‑ ¡

Why a Burst Buffer? • Mo3va3on: ¡Handle ¡spikes ¡in ¡I/O ¡bandwidth ¡ requirements ¡ ¡ – Reduce ¡overall ¡applicaEon ¡run ¡Eme ¡ – Compute ¡resources ¡are ¡idle ¡during ¡I/O ¡bursts ¡ • Some ¡user ¡applica3ons ¡have ¡challenging ¡I/O ¡paNerns ¡ – High ¡IOPs, ¡random ¡reads, ¡different ¡concurrency… ¡fits ¡well ¡on ¡ SSD ¡ • Cost ¡ra3onale: ¡Disk-‑based ¡PFS ¡bandwidth ¡is ¡expensive ¡ – Disk ¡capacity ¡is ¡relaEvely ¡cheap ¡ – SSD ¡ bandwidth ¡is ¡relaEvely ¡cheap ¡ ¡=>Separate ¡bandwidth ¡and ¡spinning ¡disk ¡ • Provide ¡high ¡BW ¡without ¡wasEng ¡PFS ¡capacity ¡ • Leverage ¡Cray ¡Aries ¡network ¡speed ¡ -‑ ¡9 ¡-‑ ¡

Why a Burst Buffer? • Mo3va3on: ¡Handle ¡spikes ¡in ¡I/O ¡bandwidth ¡ requirements ¡ ¡ – Reduce ¡overall ¡applicaEon ¡run ¡Eme ¡ – Compute ¡resources ¡are ¡idle ¡during ¡I/O ¡bursts ¡ • Some ¡user ¡applica3ons ¡have ¡challenging ¡I/O ¡paNerns ¡ – High ¡IOPs, ¡random ¡reads, ¡different ¡concurrency… ¡ ¡ • Cost ¡ra3onale: ¡Disk-‑based ¡PFS ¡bandwidth ¡is ¡expensive ¡ – Disk ¡capacity ¡is ¡relaEvely ¡cheap ¡ – SSD ¡ bandwidth ¡is ¡relaEvely ¡cheap ¡ ¡=>Separate ¡bandwidth ¡and ¡spinning ¡disk ¡ • Provide ¡high ¡BW ¡without ¡wasEng ¡PFS ¡capacity ¡ • Leverage ¡Cray ¡Aries ¡network ¡speed ¡ -‑ ¡10 ¡-‑ ¡

Why a Burst Buffer? • Mo3va3on: ¡Handle ¡spikes ¡in ¡I/O ¡bandwidth ¡ requirements ¡ ¡ – Reduce ¡overall ¡applicaEon ¡run ¡Eme ¡ – Compute ¡resources ¡are ¡idle ¡during ¡I/O ¡bursts ¡ • Some ¡user ¡applica3ons ¡have ¡challenging ¡I/O ¡paNerns ¡ – High ¡IOPs, ¡random ¡reads, ¡different ¡concurrency… ¡ ¡ • Cost ¡ra3onale: ¡Disk-‑based ¡PFS ¡bandwidth ¡is ¡expensive ¡ – Disk ¡capacity ¡is ¡relaEvely ¡cheap ¡ – SSD ¡ bandwidth ¡is ¡relaEvely ¡cheap ¡ ¡=>Separate ¡bandwidth ¡and ¡spinning ¡disk ¡ • Provide ¡high ¡BW ¡without ¡wasEng ¡PFS ¡capacity ¡ • Leverage ¡Cray ¡Aries ¡network ¡speed ¡ -‑ ¡11 ¡-‑ ¡

Cori, a Cray XC40 system • Cori ¡Phase ¡1: ¡par33on ¡to ¡support ¡data ¡intensive ¡applica3ons ¡ – 1630 ¡Intel ¡Haswell ¡nodes ¡ ¡ – Two ¡Haswell ¡processors/node, ¡ ¡ • 16 ¡cores/processor, ¡128 ¡GB ¡DDR4 ¡/node ¡ • Cori ¡Phase ¡2: ¡>9,300 ¡Intel ¡Knights ¡Landing ¡compute ¡nodes ¡ – 68 ¡processors/node, ¡16GB ¡HBM ¡on-‑package, ¡96GB ¡DDR4 ¡ • Lustre ¡Filesystem: ¡27 ¡PB ¡of ¡storage ¡served ¡by ¡248 ¡OSTs, ¡providing ¡ over ¡700 ¡GB/s ¡peak ¡performance. ¡ ¡ • Cray ¡Aries ¡high-‑speed ¡“dragonfly” ¡topology ¡interconnect ¡ • 1.5PB ¡Burst ¡Buffer… ¡ -‑ ¡12 ¡-‑ ¡ -‑ ¡12 ¡-‑ ¡

Cori, a Cray XC40 system • Cori ¡Phase ¡1: ¡par33on ¡to ¡support ¡data ¡intensive ¡applica3ons ¡ – 1630 ¡Intel ¡Haswell ¡nodes ¡ ¡ – Two ¡Haswell ¡processors/node, ¡ ¡ • 16 ¡cores/processor, ¡128 ¡GB ¡DDR4 ¡/node ¡ • Cori ¡Phase ¡2: ¡>9,300 ¡Intel ¡Knights ¡Landing ¡compute ¡nodes ¡ – 68 ¡processors/node, ¡16GB ¡HBM ¡on-‑package, ¡96GB ¡DDR4 ¡ • Lustre ¡Filesystem: ¡27 ¡PB ¡of ¡storage ¡served ¡by ¡248 ¡OSTs, ¡providing ¡ over ¡700 ¡GB/s ¡peak ¡performance. ¡ ¡ • Cray ¡Aries ¡high-‑speed ¡“dragonfly” ¡topology ¡interconnect ¡ • 1.5PB ¡Burst ¡Buffer… ¡ -‑ ¡13 ¡-‑ ¡ -‑ ¡13 ¡-‑ ¡

Burst Bu ff er Architecture Blade ¡ ¡= ¡2x ¡Burst ¡Buffer ¡Node ¡(2x ¡SSD ¡each) ¡ Compute ¡Nodes ¡ I/O ¡Node ¡(2x ¡InfiniBand ¡HCA) ¡ BB ¡ SSD ¡ CN ¡ CN ¡ SSD ¡ Storage ¡Fabric ¡ Lustre ¡OSSs/OSTs ¡ (InfiniBand) ¡ ION ¡ IB ¡ CN ¡ CN ¡ IB ¡ Aries ¡High-‑Speed ¡ Network ¡ Storage ¡Servers ¡ InfiniBand ¡Fabric ¡ • Cori ¡Stage ¡1 ¡configuraEon: ¡920TB ¡on ¡144 ¡BB ¡nodes ¡ (288 ¡x ¡3.2 ¡GB ¡SSDs) ¡ ¡ • ¡>1.5 ¡PB ¡total ¡in ¡full ¡Cori ¡system ¡ -‑ ¡14 ¡-‑ ¡

Burst Bu ff er Architecture Reality BB ¡nodes ¡scaNered ¡throughout ¡HSN ¡fabric ¡ 2 ¡BB ¡blades/chassis ¡(12 ¡nodes/cabinet) ¡in ¡Phase ¡I ¡ compute ¡nodes ¡ BB ¡nodes ¡ LNET/DVS ¡ IO ¡nodes ¡ service ¡nodes ¡ -‑ ¡15 ¡-‑ ¡

Accelerating Science with the NERSC Burst Buffer Debbie Bard Big - PowerPoint PPT Presentation

Accelerating Science with the NERSC Burst Buffer Debbie Bard Big Data Architect, Data and Analytics Services NERSC, LBL July 22, 2016 - 1 - Outline Future compu3ng architecture

Tapes Not Dead At LBNL/NERSC Nick Balthaser MSST 2019 May 21, 2019 Storage @NERSC

Accelerating Experimental Workflows on NERSC systems Katie Antypas NERSC Division Deputy

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster Larry Pezzaglia NERSC

UPDATE ON NERSC PScheD EXPERIENCES, A CONTINUING SUCCESS STORY Tina Butler - NERSC Brent Draney

Recent Workload Characterization Activities at NERSC Harvey Wasserman NERSC Science Driven System

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC,

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

Building Web Gateways to Science in Python Shreyas Cholia NERSC/LBL SciPy 2010 Jun 30 th 2010

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

SLURM. Our Way. Douglas Jacobsen, James Botts, Helen He NERSC CUG 2016 NERSC Vital Statistics

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

Faith Confidence in who God is Limitless faith no boundary to our confidence in God Faith

Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues

WORDS OF JUSTICE PSALM 37 PRESENT FOR THE RIGHTEOUS "Fret not yourself because of

Wellington: 31 st October 2016 Presented by: Stephen Jenner The Problem: the track record in

The NO n A Experiment 11 April 2014 P5 Meeting SLAC 21 February 2008 Gary Feldman Readiness

Mu2e: The FIFE Experience Rob Kutschke Fermilab Scientific Computing Division FIFE

I Upgraded iRODS And I Still Have All My Hair John Constable john.constable@sanger.ac.uk

If you build it, they will come: The challenge of developing a social networking site in a

Sambuz

Useful Links

Newsletter

Mail Us

Accelerating Science with the NERSC Burst Buffer Debbie Bard Big - PowerPoint PPT Presentation

Accelerating Science with the NERSC Burst Buffer Debbie Bard Big Data Architect, Data and Analytics Services NERSC, LBL July 22, 2016 - 1 - Outline Future compu3ng architecture

Tapes Not Dead At LBNL/NERSC Nick Balthaser MSST 2019 May 21, 2019 Storage @NERSC

Accelerating Experimental Workflows on NERSC systems Katie Antypas NERSC Division Deputy

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster Larry Pezzaglia NERSC

UPDATE ON NERSC PScheD EXPERIENCES, A CONTINUING SUCCESS STORY Tina Butler - NERSC Brent Draney

Recent Workload Characterization Activities at NERSC Harvey Wasserman NERSC Science Driven System

Performance Advantages of Using a Burst Buffer for Scientific Workflows Andrey Ovsyannikov NERSC,

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

Building Web Gateways to Science in Python Shreyas Cholia NERSC/LBL SciPy 2010 Jun 30 th 2010

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

SLURM. Our Way. Douglas Jacobsen, James Botts, Helen He NERSC CUG 2016 NERSC Vital Statistics

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

Burst Buffer Simulation In Dragonfly Network Jian Peng, Michael Lang Illinois Institute of

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

Faith Confidence in who God is Limitless faith no boundary to our confidence in God Faith

Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues

WORDS OF JUSTICE PSALM 37 PRESENT FOR THE RIGHTEOUS &quot;Fret not yourself because of

Wellington: 31 st October 2016 Presented by: Stephen Jenner The Problem: the track record in

The NO n A Experiment 11 April 2014 P5 Meeting SLAC 21 February 2008 Gary Feldman Readiness

Mu2e: The FIFE Experience Rob Kutschke Fermilab Scientific Computing Division FIFE

I Upgraded iRODS And I Still Have All My Hair John Constable john.constable@sanger.ac.uk

If you build it, they will come: The challenge of developing a social networking site in a

Sambuz

Useful Links

Newsletter

Mail Us

WORDS OF JUSTICE PSALM 37 PRESENT FOR THE RIGHTEOUS "Fret not yourself because of