Weak Supervision in High Dimensions Machine Learning for Jet Physics - PowerPoint PPT Presentation

Weak Supervision in High Dimensions Machine Learning for Jet Physics Workshop, 2017 Eric M. Metodiev Center for Theoretical Physics Massachusetts Institute of Technology Work with Patrick T. Komiske, Francesco Rubbo, Benjamin Nachman, and Matthew D. Schwartz December 13, 2017

Why learn from data? Weak Supervision in HEP Lessons from High Dimensions

Simulation vs. Data Quark/Gluon Discrimination Using two features: width and ntrk. Signal (Q) vs. Background (G) likelihood ratio [ATLAS Collaboration, arXiv: 1405.6583] Simulation Data

Mixed Samples Data does not have pure labels, but does have mixed samples! Some caveats apply. See e.g. P. Gras, et al. , arXiv: 1704.03878 𝑞 𝑁 𝑏 (𝑦) = 𝑔 𝑏 𝑞 𝑇 𝑦 + 1 − 𝑔 𝑏 𝑞 𝐶 𝑦 Fractions of quark and gluon jets studied in detail in: J. Gallicchio and M.D. Schwartz, arXiv: 1104.1175

Mixed Samples Data does not have pure labels, but does have mixed samples! Some caveats apply. See e.g. P. Gras, et al. , arXiv: 1704.03878 𝑞 𝑁 𝑏 (𝑦) = 𝑔 𝑏 𝑞 𝑇 𝑦 + 1 − 𝑔 𝑏 𝑞 𝐶 (𝑦) Criteria to use Weak Supervision: Sample Independence : The same signal and background in all the mixtures. Different Purities : 𝑔 𝑐 for some 𝑏 and 𝑐 . 𝑏 ≠ 𝑔 (Known fractions) : The fractions 𝑔 𝑏 are known.

Learning from Label Proportions (LLP) (LoLiProp?) [L. Dery, et al. , arXiv: 1702.00414] Q/GWS with 3 inputs works [L. Dery, et al. , arXiv: 1702.00414] 𝑔 𝑔 1 2 𝑂 𝑏 𝑏 , 1 ℓ LLP = ෍ ℓ 𝑔 ෍ ℎ(𝑦) 𝑂 𝑏 𝑏 𝑗=1 ℓ 𝑁𝑇𝑋 , ℓ 𝐷𝐹 , …

Classification Without Labels (CWoLa, “koala”) [EMM, B. Nachman, and J. Thaler, arXiv: 1708.02949] [T. Cohen, M. Freytsis, and B. Ostdiek, arXiv: 1706.09451] See also: [G. Blanchard, M. Flaska, G. Handy, S. Pozzi, and C. Scott, arXiv:1303.1208 ] Q/GWS with 5 inputs works [EMM, B. Nachman, and J. Thaler, arXiv: 1708.02949] No label proportions needed during training! Smoothly connected to the fully supervised case as 𝑔 1 , 𝑔 2 → 0,1 Note : Need small test sets with known signal fractions to determine the ROC.

Convolutional Net for QG CNN as in: P. Komiske, E. Metodiev, M.D. Schwartz, arXiv:1612.01551 33 x 33 = 1089 inputs, 2R=0.8 size in (𝑧, 𝜚) Only used pT -channel images

Defaults Jet Generation Z + q/g Pythia 8.226, 𝑡 = 13 TeV R=0.4 anti-kT central jets pT in [250 GeV, 275 GeV] q g Artifical q/g mixtures CNN Training Keras and TensorFlow 300k/50k/50k train/test/val data Mixed sample fractions 𝑔 1 = 0.2 and 𝑔 2 = 0.8 Batch size 400 for CWoLa and 4k for LLP ELU activation and cross-entropy loss functions Training until validation accuracy failed to improve for 10 epochs Repeat each training 10x for statistics

Training on mixed samples Q/G weak supervision with jet images works! Lesson should be true for complex models more generally. PRELIMINARY Better

What about naturally mixed samples? Z + jet: dijets: 𝑔 𝑟 = 0.88 𝑔 𝑟 = 0.37 Restrict to artificially mixed samples to have fine control of the fractions.

Purity and Number of Data Full Supervision Two mixed samples: 𝑔 PRELIMINARY 1 , 1 − 𝑔 1 Purity/Data plot can characterize tradeoffs in a weak learning method

Batch Size and Training Time Batch size PRELIMINARY Usual parameter for CWoLa Need large batch size for LLP Batch Size > 1000 𝑂 𝑏 𝑏 , 1 ℓ LLP = ෍ ℓ 𝑔 ෍ ℎ(𝑦) 𝑂 𝑏 𝑏 𝑗=1 time/epoch increases # of epochs increases

Loss and Activation Functions LLP: PRELIMINARY ELU activations help significantly over ReLU activations. Weak crossentropy loss helps over weak MSE loss. Include the softmax in the loss (not model) to avoid underflow.

Conclusions Weak supervision methods work for training complex classifiers. Have several different methods that utilize different information. Which to use depends on the specific application. LLP: Requires specialized loss functions and care Utilizes fraction information Can make use of multiple fractions CWoLa: Can use with any fully supervised technique Does not require fraction information Only works with two mixed samples

The End Why learn from data? Weak Supervision in HEP Lessons from High Dimensions

Multiple Mixture Fractions PRELIMINARY LLP

Weak Supervision in High Dimensions Machine Learning for Jet Physics - PowerPoint PPT Presentation

Weak Supervision in High Dimensions Machine Learning for Jet Physics Workshop, 2017 Eric M. Metodiev Center for Theoretical Physics Massachusetts Institute of Technology Work with Patrick T. Komiske, Francesco Rubbo, Benjamin Nachman, and

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Weak Supervision Vincent Chen and Nish Khandwala Outline Motivation We want more

Learning Dependency Structures for Weak Supervision Models Fred Sala , Paroma Varma, Ann He, Alex

Weak Supervision, noisy labels, and error propagation Marat Freytsis hep-ai journal club

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Module 4: Building Working with Standard Dimensions Dimensions Using the Basic Level

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

The weak-charged WIMP Shigeki Matsumoto (Kavli IPMU) The weak-charged WIMP, Majorana fermion with

Making weak maps compose strictly Richard Garner Uppsala University CT 2008, Calais Outline

1 Learning Curves Difficulty Curves? Practice versus Difficulty Stage 2 Stage 1 Stage 3

t ts r t

4 May 10, 2011 This Lecture ! global queue Lecture I & II with grid KOALA scheduler

1. Download the practical work tarball wget

C Cost-Sensitive Active t S iti A ti Visual Category Learning g y g Sudheendra

SI425 : NLP Set 10 Syntax and Parsing Fall 2020 : Chambers Syntax Grammar, or syntax:

david.skellern@nicta. com.au From imagination to impact 1 Saturday, 4 April 2009 From

17 June 2020 What an exciting day weve had ad to toda day, , cel celeb ebrati rating

Sambuz

Useful Links

Newsletter

Mail Us

Weak Supervision in High Dimensions Machine Learning for Jet Physics - PowerPoint PPT Presentation

Weak Supervision in High Dimensions Machine Learning for Jet Physics Workshop, 2017 Eric M. Metodiev Center for Theoretical Physics Massachusetts Institute of Technology Work with Patrick T. Komiske, Francesco Rubbo, Benjamin Nachman, and

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Weak Supervision Vincent Chen and Nish Khandwala Outline Motivation We want more

Learning Dependency Structures for Weak Supervision Models Fred Sala , Paroma Varma, Ann He, Alex

Weak Supervision, noisy labels, and error propagation Marat Freytsis hep-ai journal club

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Module 4: Building Working with Standard Dimensions Dimensions Using the Basic Level

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

The weak-charged WIMP Shigeki Matsumoto (Kavli IPMU) The weak-charged WIMP, Majorana fermion with

Making weak maps compose strictly Richard Garner Uppsala University CT 2008, Calais Outline

1 Learning Curves Difficulty Curves? Practice versus Difficulty Stage 2 Stage 1 Stage 3

t ts r t

4 May 10, 2011 This Lecture ! global queue Lecture I &amp; II with grid KOALA scheduler

1. Download the practical work tarball wget

C Cost-Sensitive Active t S iti A ti Visual Category Learning g y g Sudheendra

SI425 : NLP Set 10 Syntax and Parsing Fall 2020 : Chambers Syntax Grammar, or syntax:

david.skellern@nicta. com.au From imagination to impact 1 Saturday, 4 April 2009 From

17 June 2020 What an exciting day weve had ad to toda day, , cel celeb ebrati rating

Sambuz

Useful Links

Newsletter

Mail Us

4 May 10, 2011 This Lecture ! global queue Lecture I & II with grid KOALA scheduler