OASIS: Better simulated events to allow for fewer simulated events - - PowerPoint PPT Presentation

oasis better simulated events to allow for fewer
SMART_READER_LITE
LIVE PREVIEW

OASIS: Better simulated events to allow for fewer simulated events - - PowerPoint PPT Presentation

OASIS: Better simulated events to allow for fewer simulated events Prasanth Shyamsundar University of Florida based on [arXiv:2006.16972] OASIS: Optimal Analysis-Specifjc Importance Sampling for event generation Konstantin T.


slide-1
SLIDE 1

OASIS: “Better” simulated events to allow for fewer simulated events

Prasanth Shyamsundar

University of Florida based on [arXiv:2006.16972]

“OASIS: Optimal Analysis-Specifjc Importance Sampling for event generation”

Konstantin T. Matchev, Prasanth Shyamsundar LPC Physics Forum, Fermilab

July 30, 2020

slide-2
SLIDE 2

Motivation

▶ Simulations in HEP are computationally expensive.

  • Detector simulation is the most resource intensive part of the pipeline.
  • Projected HL-LHC computational requirements may not be met.

“Billion dollar problem”

  • Need to speed up the simulation pipeline.

Require fewer simulated events?

CMS ATLAS

  • J. Albrecht et al. [HEP Software Foundation], “A Roadmap for HEP Software and Computing R&D for the

2020s,” Comput. Softw. Big Sci. 3, no.1, 7 (2019) [arXiv:1712.06982 [physics.comp-ph]].

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 1/27 [Go to the end]

slide-3
SLIDE 3

Motivation

▶ Simulations in HEP are computationally expensive.

  • Detector simulation is the most resource intensive part of the pipeline.
  • Projected HL-LHC computational requirements may not be met.

“Billion dollar problem”

  • Need to speed up the simulation pipeline. Require fewer simulated events?

CMS ATLAS

  • J. Albrecht et al. [HEP Software Foundation], “A Roadmap for HEP Software and Computing R&D for the

2020s,” Comput. Softw. Big Sci. 3, no.1, 7 (2019) [arXiv:1712.06982 [physics.comp-ph]].

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 1/27 [Go to the end]

slide-4
SLIDE 4

Importance Sampling

▶ The simulation pipeline starts with the parton level

hard scattering.

▶ At the parton level, we can compute the probability

density of a given event.

(under a given theory/set of param values)

▶ Ingredients:

  • Matrix element
  • Parton distribution functions

▶ Given an oracle for a distribution, how do we sample

events as per the distribution? Answer: Importance Sampling

Image from the Sherpa Team

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 2/27 [Go to the end]

slide-5
SLIDE 5

Importance Sampling

x Unnormalized distribution f g

▶ f = distribution to sample from

g = distribution we can sample from

(both unnormalized)

▶ Throw darts uniformly at random into the “box”.

Or sample events according to g.

▶ Option 1: Unweighting

  • Accept the events that fall under f.

Or accept event i with probability f (xi)/g(xi).

▶ Option 2: Weighted events

  • Accept all events, but weight them

wi = f (xi)/g(xi)

▶ The “box” g doesn’t have to be a rectangle. Just

needs to be something we can sample from.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 3/27 [Go to the end]

slide-6
SLIDE 6

Importance Sampling

x Unnormalized distribution f g selected rejected

▶ f = distribution to sample from

g = distribution we can sample from

(both unnormalized)

▶ Throw darts uniformly at random into the “box”.

Or sample events according to g.

▶ Option 1: Unweighting

  • Accept the events that fall under f.

Or accept event i with probability f (xi)/g(xi).

▶ Option 2: Weighted events

  • Accept all events, but weight them

wi = f (xi)/g(xi)

▶ The “box” g doesn’t have to be a rectangle. Just

needs to be something we can sample from.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 3/27 [Go to the end]

slide-7
SLIDE 7

Importance Sampling Current philosophy: Try to make g close to f

x Unnormalized distribution f g selected rejected

Rationale 1: Unweighting effjciency... circular argument

We want unweighted events

g → f /F reduces wastage (lesser fraction of events thrown out) g → f /F is ideal

We should unweight events at the parton level before moving onto the rest of the (computationally expensive) simulation pipeline

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 4/27 [Go to the end]

slide-8
SLIDE 8

Importance Sampling Current philosophy: Try to make g close to f

x Unnormalized distribution f g selected rejected

Rationale 1: Unweighting effjciency... circular argument

We want unweighted events

g → f /F reduces wastage (lesser fraction of events thrown out) g → f /F is ideal

We should unweight events at the parton level before moving onto the rest of the (computationally expensive) simulation pipeline

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 4/27 [Go to the end]

slide-9
SLIDE 9

Importance Sampling Current philosophy: Try to make g close to f

x Unnormalized distribution f g selected rejected

Rationale 2: Cross-section estimation

F ≡

dx f (x) =

dx g(x) f (x) g(x)

= Eg[w]

( g is normalized )

⇒ ˆ

F = 1 Ns

Ns

i=1

wi var [ ˆ F ] = var [w] Ns ( g → f /F reduces variance )

Estimation of F is related to counting experiments But... HEP analyses have come a long way from counting experiments!

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 4/27 [Go to the end]

slide-10
SLIDE 10

Weighted events = Yet unexplored degree of freedom

OASIS abondons the notion that g → f /F is the best strategy

▶ Nature:

  • Produces unweighted events
  • Constrained to be distributed as per f /F

▶ Weighted simulations:

  • Not constrained... Sampling distribution g can be whatever we want!
  • OASIS exploits this freedom to an unprecedented degree

▶ Current usage examples of weighted events:

— Oversampling tails: Extract the sensitivity from the tails without wasting resources on the bulk — (Also reweighting events, combining difgerent processes)

▶ Why would we want to deviate from f /F on purpose?

  • Focus on the regions of phase space important to the analysis.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 5/27 [Go to the end]

slide-11
SLIDE 11

An example: Top mass measurement

  • A. M. Sirunyan et al. [CMS], “Measurement of the top quark mass in the dileptonic t¯

t decay channel using the mass observables Mbℓ, MT2, and Mbℓν in pp collisions at √s = 8 TeV,” Phys. Rev. D 96, no.3, 032002 (2017) [arXiv:1704.06142 [hep-ex]].

▶ Difgerent regions of the phase-space are sensitive to the value of a parameter

(or presence of a signal) to difgerent extents.

▶ More simulated events → smaller theory error bars ▶ Reducing the theory error bars everywhere (maintaining the same ratios

between error bars) is not the optimal strategy!

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 6/27 [Go to the end]

slide-12
SLIDE 12

OASIS elevator pitch

Optimal Analysis-Specifjc Importance Sampling

▶ Choose the sampling distribution optimally to maximize the sensitivity

  • f the analysis at hand, for a given computational budget.

▶ Reach the target sensitivity with fewer simulated events. ▶ Piggyback on existing importance sampling techniques.

(FOAM, VEGAS, machine-learning-based, etc)

▶ Save, in computational budget,

Hundreds of

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 7/27 [Go to the end]

slide-13
SLIDE 13

OASIS for parton level analysis

▶ To pick a good sampling distribution g, we need to understand the

relationship between the sampling distribution and the sensitivity of the analysis.

▶ Let θ be a parameter we want to measure by analyzing the parton level

events {xi}. Let L be the integrated luminosity.

▶ Fisher Information:

I(θ) = L

dx 1 f (x ; θ) [∂ f (x ; θ) ∂θ ]2 var [ ˆ θ(Data) ; θ0 ] ≥ 1

I(θ0)

▶ The lower bound is achievable in the asymptotic limit by the maximum

likelihood fjt or minimum-χ2 fjt (fjne binning).

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 8/27 [Go to the end]

slide-14
SLIDE 14

Fisher Information for simulation based analyses

I(θ) = L

dx 1 f (x ; θ) [∂ f (x ; θ) ∂θ ]2

▶ Note that there’s no g in the expression. This is for analyses based on

the functional form of f (x ; θ).

▶ What about analyses based on simulations?

(Ns events distributed as per g)

I(θ) =

dx 1 L f (x ; θ) [ L ∂ f (x ; θ) ∂θ ]2 compare to ∑

i∈x bins

s2

i

ni

  • r ∑

i∈x bins

s2

i

σ2

i, real stat

IMC(θ) =

dx [ L ∂ f (x ; θ) ∂θ ]2 L f (x ; θ) + Nsg(x) [ L Ns w(x ; θ) ]2 σ2

i, real stat → σ2 i, real stat + σ2 i, sim stat

“s” ∼ difgerence between expected counts for θ and θ + δθ

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 9/27 [Go to the end]

slide-15
SLIDE 15

Fisher Information for simulation based analyses

IMC(θ) =

dx [ L ∂ f (x ; θ) ∂θ ]2 L f (x ; θ) + Nsg(x) [ L Ns w(x) ]2 ⇒ IMC(θ) L =

dx f (x ; θ) [ ∂θ[ln f (x ; θ)] ]2 1 + L Ns w(x ; θ) ≡

dx f (x) u2(x) 1 + L Ns w(x) where u(x) ≡ ∂θ[ln f (x ; θ)] = 1 f ∂ f ∂θ u(x) is a per-event score that captures the sensitivity of event to θ. Can be computed using the matrix element oracle.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 10/27 [Go to the end]

slide-16
SLIDE 16

Some intuition + toy example

Measuring the mean of a normal dist

1 2 3 4 5 6 7 8 9 10

x

0.00 0.05 0.10 0.15 0.20

f

0.00 0.25 0.50 0.75 1.00 1.25 1.50

|u|

θ0 = 5 u = 1 f ∂ f ∂θ IMC L =

dx f (x) u2(x) 1 + L Ns w(x)

▶ LHS: to maximize by picking a good sampling dist g. ▶ L/Ns is a heuristic parameter specifying our

computational budget L Ns = F−1 Nr Ns

▶ g enters through w. Low w is good, but...

Eg[w] = ∫ dxg(x) f (x)/g(x) = F (fjxed)

▶ Assign low weights w where u is high (makes sense). ▶

L Ns w(x) captures improvement from increasing sim.

▶ 1 captures the diminishing of returns.

(real data is fjnite)

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 11/27 [Go to the end]

slide-17
SLIDE 17

Training the sampling distribution

1 2 3 4 5 6 7 8 9 10 x 0.00 0.05 0.10 0.15 0.20

Normalized distribution

IS OASIS

Ideal case Importance Sampling (IS) & Trained OASIS

▶ Parameterize g using ⃗

ϕ as a piece-wise constant distribution given by g(x) = pcell(x) Volumecell(x) pcell i = eϕi

j

eϕj (softmax)

▶ Set L/Ns = 1 (Ns ≈ Nr) ▶ Use gradient ascent to maximize IMC

(using preliminary/preexisting simulations as training data).

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 12/27 [Go to the end]

slide-18
SLIDE 18

Weights

The weights compensate for the difgerence between g and f /F w(x) = f (x) g(x)

1 2 3 4 5 6 7 8 9 10 x 0.00 0.05 0.10 0.15 0.20

Normalized distribution

IS OASIS

1 2 3 4 5 6 7 8 9 10 x 100 101 σ2

OASIS

σ2

IS

wOASIS wIS

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 13/27 [Go to the end]

slide-19
SLIDE 19

Efgect on histograms

1 2 3 4 5 6 7 8 9 10 10−2 10−1

  • Norm. hist (IS)

IS

3σ error bars 1 2 3 4 5 6 7 8 9 10 x 0.6 0.7 0.8 0.9 1.0 1.1

MC/true (IS) IS

(1σ error bars) 10−2 10−1

  • Norm. hist (OASIS)

OASIS

0.8 0.9 1.0 1.1 1.2 1.3

MC/true (OASIS) OASIS

▶ Appropriately weighted histograms under

OASIS and IS (100,000 events).

▶ Plotted on a log scale (with a shift). ▶ Both are consistent with the true

distribution — importance sampling is a robust technique.

▶ IS has smaller error bars near the center. ▶ OASIS has smaller error bars away from

the center.

▶ OASIS prioritizes based on utility to θ

measurement. (Error bar ratios shown in previous slide)

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 14/27 [Go to the end]

slide-20
SLIDE 20

Efgect on the measurement of θ

4.7 4.8 4.9 5.0 5.1

θtrial

20 40 60 80 100

χ2 − χ2

min

L = Nsim = 10000 Likelihood based OASIS sim. based IS sim. based

More concave ∼ smaller error bar

▶ Set θtrue = 4.9

  • Simulate “real events”, setting L = 10, 000.

F(θtrue) ≈ 0.9875

  • 9887 events produced in this pseudo-expt.

▶ Set simulation θ0 = 5.0

(value at which OASIS is optimized)

  • Simulate 10,000 “simulated events” each

under IS and OASIS.

  • Reweight them for difgerent values of θtrial.

▶ Perform simulation-based minimum-χ2

estimation (40 bins).

▶ Gray dotted line is the likelihood based

estimation (infjnite simulation limit).

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 15/27 [Go to the end]

slide-21
SLIDE 21

Efgect on the measurement of θ

4.80 4.85 4.90 4.95 5.00

ˆ θ (min. χ2 estimate)

5 10 15 20 25 30

Normalized histigram

2000 pseudo-expts. L = Nsim = 10000 Likelihood based OASIS sim. based IS sim. based

2000 such pseudo experiments

▶ Set θtrue = 4.9

  • Simulate “real events”, setting L = 10, 000.

F(θtrue) ≈ 0.9875

  • 9887 events produced in this pseudo-expt.

▶ Set simulation θ0 = 5.0

(value at which OASIS is optimized)

  • Simulate 10,000 “simulated events” each

under IS and OASIS.

  • Reweight them for difgerent values of θtrial.

▶ Perform simulation-based minimum-χ2

estimation (40 bins).

▶ Gray dotted line is the likelihood based

estimation (infjnite simulation limit).

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 15/27 [Go to the end]

slide-22
SLIDE 22

Efgect on the measurement of θ

L 10,000 100,000 Ns 10,000 100,000 θtrue 4.9 4.9 Training L/Ns 1 1 Simulation θ0 5 5 Pseudo-expts. 2000 500

  • ave. ˆ

θ stdev ˆ θ [IMC(θtrue)]−1/2

  • ave. ˆ

θ stdev ˆ θ [IMC(θtrue)]−1/2

Likelihood-based 4.8997(5) 2.15(3)E−2 2.108(1)E−2 4.9001(3) 6.9(2)E−3 6.667(3)E−3 OASIS-based 4.9000(6) 2.64(4)E−2 2.611(2)E−2 4.8998(4) 8.5(3)E−3 8.258(5)E−3 IS-based 4.8999(7) 3.03(5)E−2 2.957(19)E−2 4.9004(4) 9.6(3)E−3 9.390(19)E−3

Simulation parameters and summary statistics of the results from the simulated pseudo-experiments to measure θtrue. Note: IMC is a good measure of sensitivity.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 16/27 [Go to the end]

slide-23
SLIDE 23

Resource conservation

10−1 100 101 Ns / L 0.00 0.05 0.10 0.15 0.20 IMC L

84% more 23% more 121% more OASIS (training L/Ns = 1) IS Upper limit

Upper-limit achieved in infjnite statistics limit

▶ The L/Ns set at training is just a heuristic

parameter.

▶ The sampling distribution can be used to

produce any number of events.

▶ OASIS achieves target sensitivities with

fewer events than the ideal case IS.

▶ For a given number of simulated events,

OASIS ofgers better sensitivity than IS.

▶ We’re on a log scale...

These numbers are impressive!

▶ We can do better than 23% at Ns/L = 10 if

we train our sampling distribution there... Let’s do that!

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 17/27 [Go to the end]

slide-24
SLIDE 24

Varying the training L/Ns and special cases

1 2 3 4 5 6 7 8 9 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30

Normalized distribution

L Ns = 10 L Ns = 1 L Ns = 0.1 L Ns → 0

1 2 3 4 5 6 7 8 9 10

x

1

|u|

▶ All OASIS distributions prioritize regions of

higher |u|.

▶ As training L/Ns decreases, the sampling

distribution is more lenient towards low |u| regions.

▶ Rationale: In the small Ns limit, focus on the

regions of the highest |u|.

(like a delta function)

IMC L =

dx f (x) u2(x) ✚ ✚ 1 + L Ns w(x)

▶ As Ns increases, the utility of high |u| regions

saturates, so move towards lower |u| regions.

▶ In the Ns → ∞ limit, goptimal ∝ f |u|.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 18/27 [Go to the end]

slide-25
SLIDE 25

More money plots

10−1 100 101 Ns / L 0.00 0.05 0.10 0.15 0.20 IMC L

84% more 23% more 121% more OASIS (training L/Ns = 1) IS Upper limit

10−1 100 101 Ns / L 0.00 0.05 0.10 0.15 0.20 IMC L

72% more 49% more 88% more OASIS (training L/Ns = 0.1) IS Upper limit

10−1 100 101 Ns/L 50 100 150 200 Percent increase Training

L Ns = 0.1

Training

L Ns = 1

Training

L Ns = 10

Resource conservation ofgered by OASIS distributions trained for difgerent values of L/Ns.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 19/27 [Go to the end]

slide-26
SLIDE 26

OASIS at the analysis level

▶ Parton level events get mapped to analysis variables in a

probabilistic many-to-many manner, via

  • Parton showers and Initial State Radiation
  • Hadronization
  • Detector simulation
  • Event reconstruction (+ some particles are invisible)
  • Event selection/categorization
  • High level variable calculation

▶ Also, analysis level datasets are composed of several subsamples. ▶ There are model uncertainties unrelated to simulation statistics

Q1) How is the sampling distribution related to sensitivity at the analysis level? (How do our equations change?) Q2) How do we implement OASIS at the parton level when the quantity we are optimizing lives in the analysis realm?

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 20/27 [Go to the end]

slide-27
SLIDE 27

How do the equations change?

▶ Let v be the possibly-multi-dimensional analysis level variable.

(including categorization/event selection information)

▶ x is mapped to v via some transfer function. ▶ F(v ; θ) corresponds to f (x ; θ)

U(v ; θ) = ∂θ [ ln[F(v ; θ)] ]

▶ Events with the same v value can have difgerent weights. IMC becomes...

IMC L =

selected events

dv F(v) U 2(v) 1 + L Ns Eg[w2 | v] Eg[w | v]

▶ Multiple subsamples and systematics unrelated to simulation statistics...

IMC L =

selected events

dv F(v) U 2(v) 1 +

σ2

syst(v)

σ2

real stat(v) + ∑

k

F (k)(v) F(v) L N(k)

s

Eg(k)[w2 | v] Eg(k)[w | v]

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 21/27 [Go to the end]

slide-28
SLIDE 28

Implementing OASIS at the analysis level

IMC

L

=

selected events

dv

F(v) U 2(v)

1 +

σ2

syst(v)

σ2

real stat(v) + ∑

k

F (k)(v) F(v)

L N(k)

s

Eg(k)[w2 | v] Eg(k)[w | v]

▶ This expression lives at the analysis level. Importance sampling happens

at the parton level...

▶ Simplifying observation: It is always better to minimize the variance of

w in a given v bin. Eg[w2] = varg [w] + (Eg[w])2.

▶ Limit attention to sampling distributions under which the weights

(roughly) only depend on v.

IMC

L

=

selected events

dv

F(v) U 2(v)

1 +

σ2

syst(v)

σ2

real stat(v) + ∑

k

F (k)(v) F(v)

L N(k)

s

w(k)(v)

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 22/27 [Go to the end]

slide-29
SLIDE 29

Stage 1: Taking stock at the analysis level

IMC L =

selected events

dv F(v) U 2(v) 1 +

σ2

syst(v)

σ2

real stat(v) + ∑

k

F (k)(v) F(v) L N(k)

s

w(k)(v) Learn the “target distribution” or “target weights” w(k)

target(v) (up to a mult. constant)

▶ In this stage, the analysis groups decide how they want their simulated data to

be distributed in the phase space of the analysis variable.

▶ This expression can be maximized using the same technique we saw earlier.

▶ Trained OASIS distribution optimizing too aggressively?

Make it less aggressive by hand.

▶ Signal search analysis? Replace U with s(v)/b(v). ▶ Want simulations in control regions that aren’t sensitive to θ?

Fix U in those regions (or the w(k)

target) by hand. ▶ Multiple analyses using the same dataset? Find a middle ground

:^)

Try it out!

“How would the sensitivity change if we had more events here and less events there?”

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 23/27 [Go to the end]

slide-30
SLIDE 30

Stage 1: Taking stock at the analysis level

IMC L =

selected events

dv F(v) U 2(v) 1 +

σ2

syst(v)

σ2

real stat(v) + ∑

k

F (k)(v) F(v) L N(k)

s

w(k)(v) Learn the “target distribution” or “target weights” w(k)

target(v) (up to a mult. constant)

▶ In this stage, the analysis groups decide how they want their simulated data to

be distributed in the phase space of the analysis variable.

▶ This expression can be maximized using the same technique we saw earlier.

▶ Trained OASIS distribution optimizing too aggressively?

Make it less aggressive by hand.

▶ Signal search analysis? Replace U with s(v)/b(v). ▶ Want simulations in control regions that aren’t sensitive to θ?

Fix U in those regions (or the w(k)

target) by hand. ▶ Multiple analyses using the same dataset? Find a middle ground :^)

Try it out!

“How would the sensitivity change if we had more events here and less events there?”

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 23/27 [Go to the end]

slide-31
SLIDE 31

Stage 2: Translating the target weights to parton-level

▶ Importance sampling algorithms (FOAM, VEGAS, machine-learning-based) need an

  • racle which can be queried for f (x) (unnormalized).

▶ They can train a sampling distribution g that mimics the oracle. ▶ Replace the oracle for f with the oracle for ftarget(x):

parton-level event x

Showering, hadronization, detector simulation, event reconstruction, event selection/categorization, high-level variable construction

Query f (x) f /wtarget Query wtarget(v) ftarget(x) v

▶ Key idea: The map from x to v is approximately many-to-one.

Non-determinism in ftarget(x) is low.

▶ ftarget will have the same singularity structure as f...

Fast sims are good enough for training... If v is rejected, return an appropriate low ftarget value...

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 24/27 [Go to the end]

slide-32
SLIDE 32

Outlook

Untapped and unexplored optimization

▶ The performance boost we see here is signifjcant. ▶ This should not be surprising...

We’re not tweaking an existing approach to eke out some more sensitivity.

▶ We’re opening an avenue of optimization that

hasn’t been explored yet.

▶ When working on the paper, a bug in the code led to a sampling

distribution far from optimal — not avoiding the middle of the histogram as aggressively. Even that led to signifjcant improvements. (See bonus slide)

Complementary to approaches that seek to speed-up the simulation pipeline

▶ Speed up using GPUs? GANs?

OASIS can play along.

10−1 100 101 Ns/L 50 100 150 200 Percent increase Training

L Ns = 0.1

Training

L Ns = 1

Training

L Ns = 10

% increase in comp. requirements, when using IS instead of OASIS, to reach the same sensitivity

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 25/27 [Go to the end]

slide-33
SLIDE 33

Outlook

Is OASIS just introducing a compromise, because we cannot generate the amount of data we need?

▶ OASIS ∗improves the compromise. ▶ By not simulating infjnite statistics, we are

already cutting corners.

▶ OASIS makes sure that what we are cutting

are, in fact, corners.

▶ It makes sense to use OASIS even if we have

“enough” computational resources.

1 2 3 4 5 6 7 8 9 10 x 0.00 0.05 0.10 0.15 0.20

Normalized distribution

Corner Center fold Center fold

IS OASIS

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 26/27 [Go to the end]

slide-34
SLIDE 34

Outlook

▶ Lookin at these plots... (notes on next slide) ▶ We are probably looking at savings of

the order of hundreds of millions of dollars for HL-LHC alone.

▶ Implementation will likely be “simple”. ▶ Will require unprecedented level of

cooperation between

  • MC theorists
  • MC groups within experiments
  • Physics analysis groups

Thank You! Questions?

1 2 3 4 5 6 7 8 9 10

x

0.00 0.05 0.10 0.15 0.20

f

0.00 0.01 0.02 0.03

1 f

  • ∂f

∂θ

2 Local shape sensitivity

10−1 100 101 Ns/L 50 100 150 200 Percent increase Training

L Ns = 0.1

Training

L Ns = 1

Training

L Ns = 10

Jump to 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 27/27 [Go to the end]

slide-35
SLIDE 35

Notes for previous slide

Things to consider:

▶ The similarity of the “local shape sensitivity” plots in the top row... ▶ The improvements seen in the bottom-left panel... ▶ The improvements needed in the bottom-right panel... ▶ “Billion dollar problem”... ▶ One the one hand, OASIS may not be appropriate or possible for some

analyses...

▶ On the other hand, for events that don’t make it past the selection cuts,

OASIS will lead to much greater resource conservation, by aggressively undersampling them...

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 27/27 [Go to the end]

slide-36
SLIDE 36

Bonus 1: Buggy code

Properly trained

1 2 3 4 5 6 7 8 9 10 x 0.00 0.05 0.10 0.15 0.20

Normalized distribution

IS OASIS

1 2 3 4 5 6 7 8 9 10 x 0.00 0.05 0.10 0.15 0.20

Normalized distribution

IS OASIS

Buggy code

10−1 100 101 Ns/L 50 100 150 200 Percent increase Train

L Ns = 0.1

Train

L Ns = 1

Train

L Ns = 10

Buggy code...

OASIS doesn’t have to be perfect to make a difgerence

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 27/27 [Go to the end]

slide-37
SLIDE 37

Bonus 2: Special use cases...

▶ OASIS might be particularly useful for targeted analysis-specifjc QCD

background simulation.

▶ I mentioned that nature is constrained to produce unweighted events.

But maybe not... We have binary (in/out) triggers and we have unbiased prescale triggers. If there’s place for a hybrid, OASIS-like ideas can help optimize it.

Konstantin T. Matchev, Prasanth Shyamsundar [arXiv:2006.16972] 27/27 [Go to the end]