CWoLa Hunting: Extending the Bump Hunt with Machine Learning Based - - PowerPoint PPT Presentation

cwola hunting extending the bump hunt with machine
SMART_READER_LITE
LIVE PREVIEW

CWoLa Hunting: Extending the Bump Hunt with Machine Learning Based - - PowerPoint PPT Presentation

CWoLa Hunting: Extending the Bump Hunt with Machine Learning Based on: Phys. Rev. Lett. 121, 241803 (2018) [1805.02664] Jack Collins, Kiel Howe, Ben Nachman 1 CWoLa Hunting Outline 1) Machine Learning 2) Model Unspecific Searches 3) CWoLa


slide-1
SLIDE 1

CWoLa Hunting 1

CWoLa Hunting: Extending the Bump Hunt with Machine Learning Based on:

  • Phys. Rev. Lett. 121, 241803 (2018)

[1805.02664] Jack Collins, Kiel Howe, Ben Nachman

slide-2
SLIDE 2

CWoLa Hunting 2

Outline

1) Machine Learning 2) Model Unspecific Searches 3) CWoLa Hunting

slide-3
SLIDE 3

CWoLa Hunting 3

Machine Learning

Classification Regression

https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python- totally-from-a-beginners-perspective-be8dbaf22dd8

Classification Generation

slide-4
SLIDE 4

CWoLa Hunting 4

Machine Learning

Classification Regression

https://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf

Regression Generation

slide-5
SLIDE 5

CWoLa Hunting 5

Machine Learning

Classification Regression

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Generation Generation

slide-6
SLIDE 6

CWoLa Hunting 6

slide-7
SLIDE 7

CWoLa Hunting 7

slide-8
SLIDE 8

CWoLa Hunting 8

Machine Learning at the LHC

Classification Regression

ArXiv: 1511.05190 L. Oliveira, M. Kagan, L. Mackey, B. Nachman, A. Schwartzman

Classification: Jets Generation

slide-9
SLIDE 9

CWoLa Hunting 9

Machine Learning at the LHC

Classification Regression

ArXiv:1707.08600. P. T. Komiske, E. M. Metodiev, B. Nachman, M. D. Schwartz

Regression: Pileup removal Generation

slide-10
SLIDE 10

CWoLa Hunting 10

Machine Learning at the LHC

Classification Regression

ArXiv 1712.10321, M. Paganini, L. de Oliveira, B. Nachamn

Generation: Fast simulation Generation

slide-11
SLIDE 11

CWoLa Hunting 11

Basic Machine Learning Primer

https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8

0) Decide objective: e.g. classify dog vs cat pictures 1) Choose a network architecture: e.g. CNN 2) Choose loss function (objective metric) 3) Train network using training data 4) Apply network on new test data

slide-12
SLIDE 12

CWoLa Hunting 12

Basic Machine Learning Primer

https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8

0) Decide objective: e.g. classify dog vs cat pictures 1) Choose a network architecture: e.g. CNN 2) Choose loss function (objective metric) 3) Train network using training data 4) Apply network on new test data A NN is just a function mapping N input numbers to M output numbers. Trainable internal numbers – weights and biases – determine that function.

slide-13
SLIDE 13

CWoLa Hunting 13

Basic Machine Learning Primer

https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8

0) Decide objective: e.g. classify dog vs cat pictures 1) Choose a network architecture: e.g. CNN 2) Choose loss function (objective metric) 3) Train network using training data 4) Apply network on new test data Naive example: Fraction of correct predictions Practical example: Cross-entropy loss

  • ∑ y(x) log[NN(x)]
slide-14
SLIDE 14

CWoLa Hunting 14

Basic Machine Learning Primer

https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8

0) Decide objective: e.g. classify dog vs cat pictures 1) Choose a network architecture: e.g. CNN 2) Choose loss function (objective metric) 3) Train network using training data 4) Apply network on new test data Use some iterative optimization algorithm to minimize the loss function on training data. Be careful in selecting training data!

slide-15
SLIDE 15

CWoLa Hunting 15

Basic Machine Learning Primer

https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8

0) Decide objective: e.g. classify dog vs cat pictures 1) Choose a network architecture: e.g. CNN 2) Choose loss function (objective metric) 3) Train network using training data 4) Apply network on new test data Training: Slow Testing: Fast Performance may be limited by the quality / relevance of training data.

slide-16
SLIDE 16

CWoLa Hunting 16

BSM Searches: Nothing so far

slide-17
SLIDE 17

CWoLa Hunting 17

BSM Searches: Nothing so far

Possibilities: 1) LHC doesn’t have the answers to our questions 2) Maybe new physics is rare: have to wait for high luminosity LHC 3) Maybe it is there but we are not doing the right search (theory bias has been wrong)

slide-18
SLIDE 18

CWoLa Hunting 18

Are we missing something?

1) Ever-more sensitive dedicated searches for the standard culprits: – Minimal Supersymmetry – Top Partners – diboson / ttbar resonances

slide-19
SLIDE 19

CWoLa Hunting 19

Are we missing something?

1) Ever-more sensitive dedicated searches for the standard culprits: – Minimal Supersymmetry – Top Partners – diboson / ttbar resonances 2) General-purpose ‘model-independent’ searches for unexpected new physics

slide-20
SLIDE 20

CWoLa Hunting 20

Signatures vs Models

E.g. 2-body resonances (pp → X → SM SM):

[1610.09392] Craig, Draper, Kong, Ng, Whiteson

Signatures Models

slide-21
SLIDE 21

CWoLa Hunting 21

Signatures vs Models

E.g. 2-body resonances: pp → X → BSM BSM largely uncovered

slide-22
SLIDE 22

CWoLa Hunting 22

Basic Resonance Searches

E.g. Dijet Search

slide-23
SLIDE 23

CWoLa Hunting 23

Basic Resonance Searches

Selection for signal-like events

E.g. Dijet Search E.g. WW resonance, tt resonance, etc.

slide-24
SLIDE 24

CWoLa Hunting 24

Basic Resonance Searches

Selection for signal-like events

E.g. Dijet Search E.g. WW resonance, tt resonance, etc. W-jets

‘Old fashioned’: Simple few-D substructure selection ‘Modern’: Deep NN classifier using ~few hundred jet constituent inputs (~300-D selection).

slide-25
SLIDE 25

CWoLa Hunting 25

Basic Resonance Searches

Selection for signal-like events

E.g. Dijet Search E.g. ?? resonance ?-jets

slide-26
SLIDE 26

CWoLa Hunting 26

A Traditional Dichotomy

Model Inclusive Search

– Weak signal assumptions – Basic selection criteria in few variables – Large backgrounds – Risk missing a signal under background

Model Specific Search

– Strong signal assumptions – Sophisticated multivariate selection – Small backgrounds – Risk not making the ‘correct’ signal selection

How to make a search with a sophisticated multivariate selection to beat backgrounds while using weak signal assumptions (unknown specific signal model)?

→ Learn selection from data

slide-27
SLIDE 27

CWoLa Hunting 27

Why Train Machines on Data?

1) Maybe you have not simulated the correct signal model (either because you haven’t thought of it, or because it involves non-perturbative physics that prevents simulation)

slide-28
SLIDE 28

CWoLa Hunting 28

Why Train Machines on Data?

Figure taken from Ben Nachman’s talk at BOOST 2018 https://indico.cern.ch/event/649482/contributions/2993322/attachments/1688082/2715256/WeakSu pervision_BOOST2018.pdf

1) Maybe you have not simulated the correct signal model (either because you haven’t thought of it, or because it involves non-perturbative physics that prevents simulation) 2) Monte-Carlo simulation for training data may difger from real LHC data

slide-29
SLIDE 29

CWoLa Hunting 29

Weak Supervision

Solution for ML:

Train directly on data using mixed samples

A) LoLiProp (Learning from Labelled Proportions) Train using class proportions

[1702.00414] L. Dery, B. Nachman, F. Rubbo, A. Schwartzman [1706.09451] T. Cohen, M. Freytsis, B. Ostdiek

B) CWoLa (Classification Without Labels) Train to classify as mixed sample 1 or 2.

[1708.02949] E. M. Metodiev, B. Nachman, J. Thaler See also [1801.10158]

  • P. T. Komiske, E. M. Metodiev, B. Nachman, M. D. Schwartz
slide-30
SLIDE 30

CWoLa Hunting 30

CWoLa

CWoLa

Classifier trained to optimally discriminate mixed sample 1 from mixed sample 2 is also

  • ptimal for discriminating S from B, so long as:

– Samples 1 and 2 contain difgerent fractions of S and B – S in sample 1 is drawn from the same distribution as S in sample 2 – B in sample 1 is drawn from the same distribution as B in sample 2 – Training statistics are sufgiciently large How to use this for a search where S is new physics and B is SM background?

slide-31
SLIDE 31

CWoLa Hunting 31

CWoLa Hunting

  • 1. Assume signal is localized in some specific variable in which background is smooth.
slide-32
SLIDE 32

CWoLa Hunting 32

CWoLa Hunting

  • 1. Assume signal is localized in some specific variable in which background is smooth.
  • 2. Assume signal has some distinguishing characteristics within some broad set of

additional observables y.

slide-33
SLIDE 33

CWoLa Hunting 33

CWoLa Hunting

  • 1. Assume signal is localized in some specific variable in which background is smooth.
  • 2. Assume signal has some distinguishing characteristics within some broad set of

additional observables y.

  • 3. For some resonance mass hypothesis, split data into signal-region and sideband-region

mixed samples

Mixed Sample 1 Mixed Sample 2

slide-34
SLIDE 34

CWoLa Hunting 34

CWoLa Hunting

Train classifier to discriminate samples based on variables y Note: background y distribution should not be strongly varying with the resonance variable.

Mixed Sample 1 Mixed Sample 2

Selection for signal-region-like events

slide-35
SLIDE 35

CWoLa Hunting 35

CWoLa Hunting

Mixed Sample 1 Mixed Sample 2

Selection for signal-region-like events 1 2

slide-36
SLIDE 36

CWoLa Hunting 36

Overfitting and the Look Elsewhere Efgect

Of course, there is going to be a large trials factor, especially if y is high-dimensional. Easy solution: Train test split (Statistical fluctuations in training and test set are uncorrelated) More sophisticated: Nested cross-training

slide-37
SLIDE 37

CWoLa Hunting 37

Nested Cross-Training

1) Divide entire dataset into k-folds Test set Training signal region Training sideband

slide-38
SLIDE 38

CWoLa Hunting 38

Nested Cross-Training

2) Train CWoLa Classifiers

Train signal vs sideband k-1 times, rotating validation set. Average the k-1 models to form an ensemble model Background fluctuation contributions will destructively interfere, signal contributions constructively.

slide-39
SLIDE 39

CWoLa Hunting 39

Nested Cross-Training

3) Select events in each k-fold and then merge

slide-40
SLIDE 40

CWoLa Hunting 40

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ

slide-41
SLIDE 41

CWoLa Hunting 41

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ

slide-42
SLIDE 42

CWoLa Hunting 42

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ

slide-43
SLIDE 43

CWoLa Hunting 43

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ 2σ

slide-44
SLIDE 44

CWoLa Hunting 44

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ 2σ 3.5σ

slide-45
SLIDE 45

CWoLa Hunting 45

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ 2σ 3.5σ 7σ

slide-46
SLIDE 46

CWoLa Hunting 46

Application to Bump Hunt

In signal region: S = 522, S/B = 0.64% 1.5σ 2σ 3.5σ 7σ

slide-47
SLIDE 47

CWoLa Hunting 47

What has the machine learnt?

Jet 1 Jet 2

Low-ish particle multiplicity High mass 4-prongy 2-prongy Moderate mass Low-ish particle multiplicity

slide-48
SLIDE 48

CWoLa Hunting 48

No Signal → No Bump!

slide-49
SLIDE 49

CWoLa Hunting 49

What has the machine learnt?

Jet 1 Jet 2

Nothing, as desired!

slide-50
SLIDE 50

CWoLa Hunting 50

Mass Scan

slide-51
SLIDE 51

CWoLa Hunting 51

Mass Scan

slide-52
SLIDE 52

CWoLa Hunting 52

Mass Scan

slide-53
SLIDE 53

CWoLa Hunting 53

Mass Scan

slide-54
SLIDE 54

CWoLa Hunting 54

Mass Scan

slide-55
SLIDE 55

CWoLa Hunting 55

Mass Scan

slide-56
SLIDE 56

CWoLa Hunting 56

Mass Scan

slide-57
SLIDE 57

CWoLa Hunting 57

Mass Scan

slide-58
SLIDE 58

CWoLa Hunting 58

Mass Scan

slide-59
SLIDE 59

CWoLa Hunting 59

Mass Scan

slide-60
SLIDE 60

CWoLa Hunting 60

Performance Comparison

Better Fully supervised ‘dedicated search’ Fully supervised, wrong model.

slide-61
SLIDE 61

CWoLa Hunting 61

General CWoLa Hunting

  • Need some variable X (e.g. m_JJ) in which bg is smooth and signal is

localized

  • Need some other variables {Y} (e.g. jet substructure) which may provide

discriminating power which may be a-priori unknown.

  • {Y} should not be strongly correlated with X over the X-width of the

signal.

  • Or alternatively, if correlated, there may be a way to decorrelate (e.g. if

we can predict or measure the correlation, that can be subtracted away to create new uncorrelated variables).

  • Can we use low level inputs rather than expert variables?

– Difgicult to decorrelate auxiliary variables from resonance variable, but there are

ways.

– Pessimist: Only O(100) signal events → not enough to train with. – But can’t know until someone tries it!

slide-62
SLIDE 62

CWoLa Hunting 62

Other work: Autoencoders

Train only on ‘background’ (no need for signal training) Can reconstruct typical QCD background jets well, but atypical jets poorly. → Classify as ‘signal-like’ jets with poor reconstruction loss. Advantage: no need for signal events for training. Disadvantage: Can’t make use of specific signal characteristics for selection

[1808.08992] M. Farina, Y. Nakai, D. Shih

slide-63
SLIDE 63

CWoLa Hunting 63

Background-only training vs signal/sideband:

Background-only Signal / Sideband

Tagger performance does not depend on signal statistics. Tagger can never learn the specific peculiar features of the signal, and so cannot improve with greater signal rate. Tagger relies on there being sufgicient signal statistics for training. Tagger can learn the specific peculiar features of the signal, and so improves with greater signal rate, and allows for signal characterization.

Stronger in limit of very low signal statistics Stronger in limit of very high signal statistics

??

slide-64
SLIDE 64

CWoLa Hunting 64

slide-65
SLIDE 65

CWoLa Hunting 65

Toy Statistics