PDSW 2019 Panel A h ouse divided: Why dont cloud storage and HPC - - PowerPoint PPT Presentation

pdsw 2019 panel
SMART_READER_LITE
LIVE PREVIEW

PDSW 2019 Panel A h ouse divided: Why dont cloud storage and HPC - - PowerPoint PPT Presentation

PDSW 2019 Panel A h ouse divided: Why dont cloud storage and HPC storage share more technology? Brent Welch (Google) Raghu Raja (Amazon) Evan Burness (Microsoft) 2:00pm to 2:40pm, Monday November 18 Brent Welch Works at Google in GCP


slide-1
SLIDE 1

PDSW 2019 Panel

A house divided: Why don’t cloud storage and HPC storage share more technology?

Brent Welch (Google) Raghu Raja (Amazon) Evan Burness (Microsoft) 2:00pm to 2:40pm, Monday November 18

slide-2
SLIDE 2

Brent Welch

  • Works at Google in GCP (public cloud platform)
  • My focus is resource management at scale, not core storage systems
  • Built Sprite Distributed File System in the 1980’s for PhD at UCB
  • Helped build PanFS Distributed Filesystem in the 2000’s
  • Helped with OSD-T10 Object Storage Device (OSD) standard
  • Helped with NFSv4.{1,2} Parallel NFS standard
  • Created exmh email user interface, tclhttpd web server
  • Enjoys gardening, footbag, hiking, juggling, etc.
slide-3
SLIDE 3

Because POSIX

  • Serious Cloud HPC users re-write apps to use cloud native storage
  • Write-once
  • Non-POSIX namespace
  • Highly scalable
  • User-space services implement various semantics (key-value, (no)sql, files)
  • POSIX is useful for the long tail
  • Lots of “dusty deck” applications that (think they) need POSIX and can be

served well by a single NFS server

  • There are various solutions that map POSIX (or NFS) to cloud buckets
slide-4
SLIDE 4

Raghunath Rajachandrasekar

  • AWS: HPC instance software, EFA, FSx for Lustre…
  • Cray: Object storage for HPC, DataWarp, Lustre…
  • OSU: MVAPICH MPI, Checkpointing, Networking…
  • LLNL: SCR, in-memory checkpointing, filesystems…
slide-5
SLIDE 5

Observations / Discussion starters

  • Title of the panel
  • Two communities coming together
  • Shoehorning in tech?
  • When to reuse, when to reinvent
  • Drawing from HPC+ML
  • Design patterns and principles
slide-6
SLIDE 6

Evan Burness

Mic Microso soft Azure e (2017 2017-2019) 2019)

Principal Program Manager for H-series VMs for High Performance Computing

Cy Cycle Co Compu puting (2016-2017) 2017)

Director for High-Performance Computing

Na Nationa nal C Cent nter f for S Sup upercomput uting ng A Applications ns ( (Uni

  • niv. I

. Illino nois) ( (2009-2016) 2016)

Program Manager, Private Sector Program

In Interests ts

Being a dad to this little guy Duke basketball All the HPC things!

slide-7
SLIDE 7

What happens first - the first sustained 1 exaflop FP64 app, or 100 exaflop lower precisions apps? If you believe the latter – should the modeling and simulation community intentionally move to align it’s storage and I/O architectures to what the AI Community will do?