High-multiplicity multi-jet merging with HPC technology Stefan - - PowerPoint PPT Presentation

high multiplicity multi jet merging with hpc technology
SMART_READER_LITE
LIVE PREVIEW

High-multiplicity multi-jet merging with HPC technology Stefan - - PowerPoint PPT Presentation

High-multiplicity multi-jet merging with HPC technology Stefan Hoeche, Xiangyang Ju, Jim Kowalkowski, Stephen Mrenna, Tom Peterka, Stefan Prestel, Holger Schulz Physics Event Generator Computing Workshop, CERN, November 28, 2018 1/14


slide-1
SLIDE 1

High-multiplicity multi-jet merging with HPC technology

Stefan Hoeche, Xiangyang Ju, Jim Kowalkowski, Stephen Mrenna, Tom Peterka, Stefan Prestel, Holger Schulz Physics Event Generator Computing Workshop, CERN, November 28, 2018

1/14

slide-2
SLIDE 2

Motivation

arXiv:1409.8639 [hep-ex]

1 2 3 4 5 6 7 8 Events 10

2

10

3

10

4

10

5

10

6

10

7

10

8

10

9

10

10

10

  • 1

= 7 TeV, 4.6 fb s Data, (ALPGEN) ν e → W t t Other ee → Z Multijets (SHERPA) ν e → W Pred sys stat ⊗ Pred sys ATLAS

jets

N 1 2 3 4 5 6 7 8 Pred / Data 0.5 1 1.5

◮ LHC experiments can see 8 jets ◮ High precision predictions for e.g. searches should reflect that ◮ Can we do this on HPC?

2/14

slide-3
SLIDE 3

Trials in (LO) ME level events

50000 100000 150000 200000 250000 300000

Ntrials

10

7

10

6

10

5

10

4

10

3

10

2

10

1

100

1/NdN/dNtrials

W+0j W+1j W+2j W+3j W+4j W+5j W+6j W+7j W+8j

◮ Distribution of trials gets flatter with number of jets. ◮ Huge variation of Matrix Element (ME)-level compute time. ◮ Traditional Sherpa way of doing all in one go just does not scale.

(See also T. Childers et al. doi:10.1088/1742-6596/898/7/072044)

3/14

slide-4
SLIDE 4

Our approach to event generation on HPC

◮ Use Sherpa to generate ME-level events (Les Houches like format) ◮ XML output is not a good solution for HPC machines ◮ Use HDF5 instead:

Parallel write and read Binary storage of data, built-in compression

◮ Particle level event generation and merging with Pythia8 — we

use ASCR’s DIY technology for MPI parallelisation here

,

DIY: LHAPDF, Pythia8, Rivet

LHE ME level Particle level HDF5 Sherpa, MPI

4/14

slide-5
SLIDE 5

HDF5 storage

Dataset data type nparticles int scale double aqcd double . . . npLO double npNLO double weight double trials double Table: Event properties Dataset data type id int status int mother1 int color1 int px double . . . lifetime double spin double Table: Particle properties Dataset data type start size t end size t Table: Lookup-table

◮ Trivial (parallel) storage of properties in 1D datasets of basic types ◮ Trivial (parallel) access by index, connection between event and

particle properties by lookup table

5/14

slide-6
SLIDE 6

Technicalities

◮ Requirement: libhdf5 (apt-get / dnf install, standard on HPC) ◮ Header-only library HighFive github.com/BlueBrain/HighFive ◮ N.b. very nice python library h5py, works beautifully with numpy

(used this initally to convert LHE XML files to hdf5 but this is quite cumbersome)

◮ Header-only library DIY used in particle level simulation

http://diatomic.github.io/diy

◮ Computing model based on “blocks” ◮ Does all the low-level MPI communication for you

6/14

slide-7
SLIDE 7

W+jets example

◮ W+jets at √s = 14 TeV simulation. ◮ Merging scale is at 20 GeV. ◮ The simulation is at leading order, the merging scheme is

CKKW-L.

◮ ME-level event generation done at SLAC cluster of Xeon E5 CPUs. ◮ Particle level event generation on NERSC Cori using Haswell

nodes.

Njets 1 2 3 4 5 6 7 8 Nevents 65M 32M 16M 8M 4M 2M 1M 500k 250k HDF5 (9) [GB] 7.1 4.9 3.0 1.8 1 0.6 0.3 0.2 0.1 HDF5 (0) [GB] 26 16 9.1 5.2 2.9 1.9 1.2 0.62 0.25

◮ Number of quarks limited to ≤ 6 for Njets = 6, 7 ◮ Number of quarks limited to ≤ 4 for Njets = 8

7/14

slide-8
SLIDE 8

CPU cost analysis

◮ Process ME-samples with different jet multiplicities separately. ◮ Compare ME-level and particle level event simulation. ◮ Note that the measure is CPUh per 1M events

1 2 3 4 5 6 7 8 10−1 100 101 102 103 104 105 106

CPUh/Mevt

Event generation cost for W+jets at 14 TeV

ME level Particle level 1 2 3 4 5 6 7 8

Number of additional jets

1 10 100 1000 10000

ME/Particle

8/14

slide-9
SLIDE 9

Benefits

◮ The CPU expensive part of the simulation is stored in a

parton-shower independent format.

◮ Running the particle level simulation now cheap in comparison,

allows e.g.

PDF re-weighting All sorts of variation studies Tuning and similar parameter space exploration

◮ Can think of a hybrid strategy for event generation:

Do low multiplicity as per usual Generate higher multiplicities with this approach

9/14

slide-10
SLIDE 10

Jet rates

∑ 0j 1j 2j 3j 4j 5j 6j 7j 8j 10−2 10−1 1 10 1 10 2 10 3 10 4 Differential 0 → 1 jet resolution dσ/d log10(d01/GeV) [pb] 0.5 1 1.5 2 2.5 3 3.5 10−2 10−1 1 log10(d01/GeV) Ratio ∑ 0j 1j 2j 3j 4j 5j 6j 7j 8j 10−2 10−1 1 10 1 10 2 10 3 10 4 Differential 2 → 3 jet resolution dσ/d log10(d23/GeV) [pb] 0.5 1 1.5 2 2.5 3 3.5 10−2 10−1 1 log10(d23/GeV) Ratio

10/14

slide-11
SLIDE 11

Jet rates

∑ 0j 1j 2j 3j 4j 5j 6j 7j 8j 10−2 10−1 1 10 1 10 2 10 3 10 4 Differential 4 → 5 jet resolution dσ/d log10(d56/GeV) [pb] 0.5 1 1.5 2 2.5 3 3.5 10−2 10−1 1 log10(d56/GeV) Ratio ∑ 0j 1j 2j 3j 4j 5j 6j 7j 8j 10−2 10−1 1 10 1 10 2 10 3 10 4 Differential 7 → 8 jet resolution dσ/d log10(d78/GeV) [pb] 0.5 1 1.5 2 2.5 3 3.5 10−2 10−1 1 log10(d78/GeV) Ratio

11/14

slide-12
SLIDE 12

Scaling

◮ Scaling of pure particle level event generation for total samples ◮ Software stack compiled on NERSC Cori (gcc7.3), measurements

done on Haswell nodes

◮ N.b. with 16 nodes (512 ranks): 15 minutes — with

HepMC+Rivet as in plots above: 25 minutes

4 8 16 32

Nnodes (32 ranks per node)

100 200 300 400 500 600 700

t[s]

W + 0j W + 1j W + 2j W + 3j W + 4j W + 5j W + 6j W + 7j W + 8j 12/14

slide-13
SLIDE 13

Summary and outlook

◮ Prototype for relatively efficient merged LO W+8j event

simulation workflow

◮ For pragmatic reasons: Sherpa for ME level event generation and

Pythia8 for particle level simulation

Store CPU expensive part (ME-level) on disk Particle level run-time up to 4 orders of magnitude faster than ME Main technologies used for parallelisation: DIY and HDF5

◮ Although we use technology aimed at HPC architectures, the code

runs well on laptops, clusters etc.

◮ Want to understand scaling better, investigate with vtune ◮ Look at Z+jets, higgs, ttbar next. ◮ Would a hybrid strategy for event generation be a good idea?

13/14

slide-14
SLIDE 14

Acknowledgement

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of HEP, Scientific Discovery through Advanced Computing (SciDAC) program.

14/14

slide-15
SLIDE 15

Timing and memory usage (Sherpa 3.x.y + HDF5)

LO ME level event generation only (Comix; γ, Z, h, µ, νµ, τ, ντ off)

Process W ++ 1j 2j 3j 4j RAM Usage 21 MB 43 MB 48 MB 85 MB Init/startup time <1s / <1s <1s / <1s 2s / <1s 32s / <1s Integration time 8×4m26s 16×16m42s 32×20m26s 64×1h32m MC uncertainty 0.22% 0.46% 0.89% 0.97% Unweighting eff 6.59 · 10−3 7.50 · 10−4 2.71 · 10−4 1.47 · 10−4 10k evts 1m 2s 15m 5s 1h 3m 5h 56m

Numbers generated on dual 8-core Intel R

Xeon R E5-2660 @ 2.20GHz

Process W ++ 5j 6j∗ 7j∗ 8j† RAM Usage 189 MB 484 MB 1.32 GB 1.32 GB Init/startup time 3m5s / 1s 24m52s / 5s 3h6m / 18s 5h55m / 29s Integration time 128×4h38m 256×13h53m 512×19h0m 1024×23h8m MC uncertainty 1.0% 0.99% 2.38% 4.68% Unweighting eff 9.56 · 10−5 7.66 · 10−5 7.20 · 10−5 7.51 · 10−5 10k evts 24h 40m 2d 11h 10d 15h 78d 1h

Numbers generated on dual 8-core Intel R

Xeon R E5-2660 @ 2.20GHz ∗,† Number of quarks limited to ≤6/4

slide-16
SLIDE 16

Plans for NLO event generation

◮ For large class of processes, NLO fixed-order and MC@NLO agree

well with each other and with MEPS@NLO (ր e.g. plots below)

◮ Indicates best technical option: Store MC@NLO simulated events

◮ Pro: Parton-shower independent results ◮ Con: Restricted possibility for variations

◮ H+jet @ LHC 13 TeV

H+jet, √s=13 TeV anti-kT R=0.7 Sherpa+BlackHat MEPS@NLO (1)2j MCNLO NLO 10−7 10−6 10−5 10−4 10−3 10−2 dσ/dpT, j [pb/GeV] 200 400 600 800 1000 1200 1400 1600 1800 2000 0.6 0.8 1 1.2 1.4 pT, j [GeV] Ratio to MEPS@NLO

◮ Z+jet @ LHC 13 TeV

H+jet, √s=13 TeV anti-kT R=0.8 Sherpa+BlackHat MEPS@NLO (1)2j MCNLO NLO 10−6 10−5 10−4 10−3 10−2 10−1 1 dσ/dpT, j [pb/GeV] 200 400 600 800 1000 1200 1400 1600 1800 2000 0.6 0.8 1 1.2 1.4 pT, j [GeV] Ratio to MEPS@NLO

slide-17
SLIDE 17

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

log10w

10

6

10

5

10

4

10

3

10

2

10

1

100 101 102

Frequency

W+0j W+1j W+2j W+3j W+4j W+5j W+6j W+7j W+8j