Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop - PowerPoint PPT Presentation

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop November 6 th 2018 Wahid Bhimji � (feat. Steve Farrell, Thorsten Kurth, Ben Nachman, Mustafa Mustafa, Michela Paganini, Prabhat, Evan Racah and others) and others) NERSC, Berkeley Lab (LBNL)

Outline Introduc)on ¡ ¡ • – HEP ¡/ ¡NP ¡and ¡Deep ¡Learning ¡ – NERSC ¡machines ¡ ¡ Towards ¡a ¡Pla1orm ¡for ¡Scien)fic ¡Learning ¡– ¡Produc)on ¡DL ¡stack ¡at ¡NERSC ¡ • Examples ¡of ¡applica)ons: ¡HEP/NP ¡Deep ¡Learning ¡projects ¡at ¡NERSC ¡ • – Supervised ¡Learning: ¡Classifica=on ¡with ¡CNNs ¡ – Unsupervised ¡Learning: ¡Genera=on ¡with ¡GANs ¡ – Alterna=ve ¡representa=ons: ¡GraphCNN ¡ – Bayesian ¡Inference ¡with ¡Probabilis=c ¡Programming ¡ Produc)ve ¡DL ¡at ¡Scale ¡ • 2 ¡

Introductions 3 ¡

HEP/NP/Cosmology in practice Ωc ¡ ¡ σ8 ¡ ¡ θ 12 ¡ m H ¡ ¡ … ¡ Theory ¡into ¡Simula)ons ¡ Summary ¡sta)s)cs: ¡ Exp/Obs ¡ ¡reconstruc)on ¡ E.g. ¡2pt ¡/3pt ¡ Cosmology: ¡high-‑resolu=on; ¡ • • • Derive ¡posi=on ¡of ¡galaxies/ correla=on: ¡spa=al ¡ produce ¡mass ¡densi=es; ¡ stars ¡and ¡proper=es ¡for ¡ distribu=on ¡ populate ¡with ¡galaxies ¡ ¡ catalogs ¡ E.g. ¡Masses ¡of ¡ • HEP/NP: ¡detailed ¡physics ¡and ¡ • • Reconstruct ¡par=cle ¡ reconstructed ¡ detector ¡simula=on ¡ proper=es ¡ par=cles ¡ ¡ 4 ¡

HEP/NP/Cosmology in practice Ωc ¡ ¡ σ8 ¡ ¡ θ 12 ¡ m H ¡ ¡ … ¡ Many ¡areas ¡where ¡deep ¡learning ¡(etc.) ¡can ¡help, ¡e.g.: ¡ Classifica)on ¡to ¡find ¡physics ¡objects ¡or ¡new ¡‘signal’ ¡events ¡(on ¡high ¡dimensional ¡data) ¡ • Regression ¡to ¡aid ¡reconstruc=on ¡or ¡of ¡fundamental ¡physics ¡parameters ¡ • Clustering ¡features ¡ in ¡high-‑dimension ¡raw ¡data ¡for ¡new ¡physics ¡or ¡instrument ¡issues ¡ ¡ • Genera)on ¡of ¡data ¡to ¡replace ¡simula=on ¡ • Inference ¡ directly ¡of ¡underlying ¡physics ¡from ¡instrument ¡data ¡ • 5 ¡

NERSC Global ¡ Edison ¡Cray ¡XC-‑30 ¡ Mission ¡HPC ¡center ¡for ¡US ¡Dept. ¡of ¡ ¡ ¡ 7.6 ¡PB ¡ Filesystems ¡ ¡ Energy ¡Office ¡of ¡Science : ¡ Lustre ¡ ¡ ¡ >7000 ¡users; ¡100s ¡of ¡projects; ¡diverse ¡sciences ¡ /home ¡ ¡ /project ¡ Cori: ¡31.4 ¡PF ¡Peak ¡–#10 ¡in ¡Top500 ¡ Cori ¡Cray ¡XC-‑40 ¡ /projecta ¡ 1.8 ¡PB ¡ • 2388 ¡Haswell ¡32-‑core ¡2.3 ¡GHz; ¡128 ¡GB ¡ ¡ ¡ Flash ¡ ¡ HPSS ¡ • 9668 ¡KNL ¡XeonPhi ¡68-‑core ¡1.4 ¡GHz ¡ ¡ ¡ ¡ ¡ ¡ 4 ¡hardware ¡threads; ¡AVX-‑512; ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 30 ¡PB ¡ ¡Data ¡Transfer ¡ Lustre ¡ 16 ¡GB ¡MCDRAM, ¡96 ¡GB ¡DDR4 ¡ Nodes ¡(DTN) ¡ • Cray ¡Aries ¡high-‑speed ¡“dragonfly” ¡ ¡PDSF/ Ethernet ¡ ¡ ¡ interconnect ¡ clusters ¡ IB/Storage ¡ • 28 ¡PB ¡Lustre ¡FS: ¡700 ¡GB/s ¡peak ¡ Fabric ¡ 2x100 ¡Gb ¡ Databases ¡ ¡ • 1.8 ¡ PB ¡Flash ¡Burst ¡Buffer: ¡1.7 ¡TB/s ¡ ¡ ¡ So#ware ¡Defined ¡ WAN ¡ /other ¡servers ¡ ¡Networking ¡

Perlmutter: A System for Science ● Cray ¡Shasta ¡System: ¡3-‑4x ¡capability ¡of ¡Cori ¡ ● GPU-‑accelerated ¡and ¡CPU-‑only ¡nodes ¡meet ¡ the ¡needs ¡of ¡large ¡scale ¡simula)on ¡and ¡data ¡ analysis ¡from ¡experimental ¡facili)es ¡ ○ >4,000 ¡node ¡CPU-‑only ¡par==on ¡= ¡all ¡of ¡Cori ¡ ○ Op=mized ¡stack ¡for ¡analy=cs/ ¡ML ¡at ¡scale ¡ ○ GPU ¡nodes: ¡4 ¡NVIDIA ¡GPUs: ¡Tensor ¡Cores; ¡ ¡ NVLink-‑3; ¡ ¡1 ¡AMD ¡“Milan” ¡CPU ¡ ● Cray ¡“Slingshot”: ¡High-‑performance ¡ Ethernet-‑ ¡compa)ble ¡network ¡ Delivery ¡in ¡ ¡ ¡ ¡ ○ Capable ¡of ¡Terabit ¡connec=ons ¡to ¡outside ¡ late-‑2020 ¡ ● All-‑Flash ¡Lustre ¡based ¡HPC ¡file ¡system ¡ ○ 6x ¡Cori’s ¡bandwidth ¡

Deep Learning Production Stack at NERSC 8 ¡

Provide a platform for scientific learning NERSC ¡Data ¡and ¡Analy)cs ¡Group: ¡ ¡ • Provide ¡training ¡and ¡tools ¡for ¡ machine ¡learning ¡ Science ¡Apps ¡ • Op)mize ¡tools ¡for ¡hardware ¡and ¡ Interac=ve ¡Interfaces ¡ ¡ Automa=on ¡ for ¡produc)vity ¡and ¡scale ¡ Methods, ¡Approaches ¡and ¡ • Encourage ¡cu`ng-‑edge ¡ Architectures ¡Tailored ¡for ¡Science ¡ methods ¡and ¡new ¡applica)ons ¡ • Collabora)ve ¡Projects ¡(with ¡ Frameworks ¡and ¡Libraries ¡for ¡Scale ¡ Scien)sts/ ¡ML ¡Researchers/ ¡ Integrated ¡ML/Simula=on/Data ¡HPC ¡ Industry) ¡ System ¡Hardware ¡ hsp://www.nersc.gov/users/data-‑analy=cs/data-‑analy=cs-‑2/deep-‑learning/ ¡ 9 ¡

Tools DL ¡Frameworks ¡evolving ¡rapidly: ¡ NERSC ¡ML ¡Survey ¡(Now): ¡ Caffe/Theano ¡ ¡popular ¡3 ¡years ¡ago ¡– ¡now ¡Tensorflow ¡(TF) ¡ dominates ¡(and ¡Keras ¡now ¡in ¡TF); ¡Recent ¡rise ¡of ¡PyTorch ¡ ¡ Percent ¡of ¡ML ¡Papers ¡That ¡Men)on: ¡ 2018 ¡ 2015 ¡ Source: ¡hsps://twiser.com/karpathy/status/972295865187512320?lang=en ¡ 10 ¡ ¡

Tools and Training Python ¡DL ¡frameworks ¡rely ¡on ¡ • op)mized ¡backends ¡to ¡perform ¡ – For ¡CPU ¡like ¡Cori ¡KNL ¡this ¡is ¡Intel ¡MKL ¡ – Working ¡with ¡Intel ¡to ¡improve ¡ performance ¡for ¡common ¡networks ¡(and ¡ science ¡problems) ¡ ¡ • Training ¡events ¡for ¡example ¡ – Data ¡day hsps://www.nersc.gov/users/training/ data-‑day/data-‑day-‑2018/ ¡ – Deep ¡Learning ¡At ¡Scale ¡at ¡SC18 ¡(next ¡ Monday) ¡(with ¡Cray ¡Inc.) ¡ ¡

Methods and Applications 12 ¡

ML Applications in Science NERSC ¡ML ¡Survey: ¡ Example ¡projects ¡at ¡LBL/NERSC: ¡ 13 ¡

Classification with Convolutional Neural Networks • CNN ¡– ¡shared ¡non-‑linear ¡filters; ¡reduce ¡weights; ¡exploit ¡ locality ¡and ¡symmetries: ¡now ¡popular ¡in ¡many ¡science ¡studies ¡ • E.g. ¡LHC-‑CNN: ¡Unroll ¡cylindrical ¡detector ¡data ¡for ¡image 1 ; ¡ ¡ classify ¡known ¡(QCD) ¡vs ¡new ¡physics ¡(RPV ¡supersymmetry) ¡ From ¡ATLAS-‑CONF-‑2016-‑057: ¡ Use ¡3 ¡channels ¡for ¡EM ¡and ¡HCal ¡Calorimeters ¡and ¡number ¡of ¡tracks 2 ¡and ¡ – whole ¡detector ¡image ¡64x64 ¡bins ¡(~0.1 ¡η/ɸ ¡towers) ¡or ¡224x224 ¡ Use ¡our ¡own ¡large ¡(Pythia+Delphes) ¡simulated ¡data ¡samples ¡ – (3 ¡or ¡4) ¡alterna=ng ¡convolu=onal ¡and ¡pooling ¡layers ¡with ¡batch ¡norm. ¡ – ¡ η ¡ ɸ ¡ ¡ ¡ Bhimji, Farrell, Kurth, Paganini, Prabhat, Racah https://arxiv.org/abs/1711.03573 1 ¡As ¡also ¡in ¡ de Oliviera et. al. ( arXiv: 1511.05190) and others 2 ¡ ¡ Similar ¡to ¡Komiske, ¡Metodiev, ¡and ¡Schwartz ¡arXiv:1612.01551 ¡ ¡ ¡

WB, ¡ Steve ¡ Farrell ¡ Thorsten ¡ Kurth, ¡ Michela ¡ CNN performance Paganini, ¡Prabhat, ¡Evan ¡Racah ¡ hsps://arxiv.org/abs/1711.03573 ¡ • Use ¡re-‑implementa=on ¡of ¡exis=ng ¡ ¡ physics ¡selec=ons ¡on ¡jet ¡variables ¡from ¡ ATLAS-‑CONF-‑2016-‑057 ¡as ¡a ¡benchmark ¡ ¡ • Also ¡compare ¡to ¡boosted ¡decision ¡tree ¡ (GBDT) ¡and ¡1-‑layer ¡NN ¡(MLP) ¡ ¡ – Input ¡to ¡these ¡jet ¡variables ¡used ¡in ¡ the ¡physics ¡analysis ¡(Sum ¡of ¡Jet ¡ Mass, ¡Number ¡of ¡Jets, ¡Eta ¡ between ¡leading ¡2 ¡jets) ¡and ¡four-‑ momentum ¡of ¡first ¡5 ¡jets ¡ Poten5al ¡to ¡increase ¡signal ¡efficiency ¡(from ¡0.41 ¡to ¡0.77) ¡at ¡same ¡background ¡rejec5on ¡as ¡ selec5ons ¡without ¡using ¡jet ¡variables ¡(approximate ¡significance ¡increase ¡of ¡1.8x) ¡ Further ¡improvement ¡from ¡using ¡3-‑channels: ¡ Energy ¡in ¡E-‑Cal, ¡H-‑Cal ¡and ¡No. ¡tracks ¡ -‑ ¡15 ¡-‑ ¡

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop - PowerPoint PPT Presentation

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop November 6 th 2018 Wahid Bhimji (feat. Steve Farrell, Thorsten Kurth, Ben Nachman, Mustafa Mustafa, Michela Paganini, Prabhat, Evan Racah and others) and others) NERSC,

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster Larry Pezzaglia NERSC

UPDATE ON NERSC PScheD EXPERIENCES, A CONTINUING SUCCESS STORY Tina Butler - NERSC Brent Draney

Accelerating Experimental Workflows on NERSC systems Katie Antypas NERSC Division Deputy

Recent Workload Characterization Activities at NERSC Harvey Wasserman NERSC Science Driven System

Tapes Not Dead At LBNL/NERSC Nick Balthaser MSST 2019 May 21, 2019 Storage @NERSC

SLURM. Our Way. Douglas Jacobsen, James Botts, Helen He NERSC CUG 2016 NERSC Vital Statistics

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter

Building Web Gateways to Science in Python Shreyas Cholia NERSC/LBL SciPy 2010 Jun 30 th 2010

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

I/O for Deep Learning at Scale Quincey Koziol Principal Data Architect, NERSC koziol@lbl.gov

The dual life of giant gravitons David Berenstein UCSB Based on: hep-th/0306090, hep-th/0403110

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

Deep Learning to Evaluate Secure RSA Implementations Mathieu Carbone, Vincent Conin, Marie-Angela

DeepMPLS: Fast Analysis of MPLS Configurations Using Deep Learning Fabien Geyer 1,2 and Stefan

DEEP LEARNING FOR ACTIVITY RECOGNITION (A BRIEF AND INCOMPLETE SURVEY) GRAHAM TAYLOR VISION,

DISTINGUISHING BETWEEN TYPICALLY DEVELOPING ENGLISH LEARNERS AND THOSE WITH READING AND LEARNING

WEM Reform Implementation Group (WRIG) Meeting #5 1 October 2020 Ground rules and virtual

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop - PowerPoint PPT Presentation

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop November 6 th 2018 Wahid Bhimji (feat. Steve Farrell, Thorsten Kurth, Ben Nachman, Mustafa Mustafa, Michela Paganini, Prabhat, Evan Racah and others) and others) NERSC,

Mendel at NERSC: Multiple Workloads on a Single Linux Cluster Larry Pezzaglia NERSC

UPDATE ON NERSC PScheD EXPERIENCES, A CONTINUING SUCCESS STORY Tina Butler - NERSC Brent Draney

Accelerating Experimental Workflows on NERSC systems Katie Antypas NERSC Division Deputy

Recent Workload Characterization Activities at NERSC Harvey Wasserman NERSC Science Driven System

Tapes Not Dead At LBNL/NERSC Nick Balthaser MSST 2019 May 21, 2019 Storage @NERSC

SLURM. Our Way. Douglas Jacobsen, James Botts, Helen He NERSC CUG 2016 NERSC Vital Statistics

DVS, GPFS and External Lustre at NERSC How Its Working on Hopper Tina Butler, Rei Chi Lee,

External Services on the NERSC Hopper System Katie Antypas, Tina Butler, and Jonathan Carter

Building Web Gateways to Science in Python Shreyas Cholia NERSC/LBL SciPy 2010 Jun 30 th 2010

Filesystems and I/O Balance on the NERSC T3E Tina Butler, NERSC Systems Group This work was

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

RAMP for Exascale RAMP Wrap August 25th, 2010 Kathy Yelick NERSC Overview NERSC represents

I/O for Deep Learning at Scale Quincey Koziol Principal Data Architect, NERSC koziol@lbl.gov

The dual life of giant gravitons David Berenstein UCSB Based on: hep-th/0306090, hep-th/0403110

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Interpretable &amp; Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

Deep Learning to Evaluate Secure RSA Implementations Mathieu Carbone, Vincent Conin, Marie-Angela

DeepMPLS: Fast Analysis of MPLS Configurations Using Deep Learning Fabien Geyer 1,2 and Stefan

DEEP LEARNING FOR ACTIVITY RECOGNITION (A BRIEF AND INCOMPLETE SURVEY) GRAHAM TAYLOR VISION,

DISTINGUISHING BETWEEN TYPICALLY DEVELOPING ENGLISH LEARNERS AND THOSE WITH READING AND LEARNING

WEM Reform Implementation Group (WRIG) Meeting #5 1 October 2020 Ground rules and virtual

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech