Guiding New Physics Searches with Unsupervised Learning [DS, - PowerPoint PPT Presentation

IML Working Group, CERN 2018-10-12 Guiding New Physics   Searches with   Unsupervised Learning [DS, Jacques - 1807.06038] Andrea De Simone andrea.desimone@sissa.it

  > New Physics ? Searches for New Physics Beyond the Standard Model have been negative so far… MAYBE: 1. New Physics (NP) is not accessible by LHC   new particles are too light/heavy   or interacting too weakly   2. We have not explored all the possibilities   new physics may be buried under large bkg   or hiding behind unusual signatures 2 A. De Simone

  > New Physics ? “Don’t want to miss a thing” (in data)   closer look at current data get ready for upcoming data from next run     Model-independent search   searches for specific models may be:   - time-consuming   - insensitive to unexpected/unknown processes 3 A. De Simone

> New Statistical Test Want a statistical test for NP which is: 1. model-independent:   no assumption about underlying physical model to intepret data more general   2. non-parametric:   compare two samples as a whole (not just their means, etc.)   fewer assumptions, no max likelihood estim. 3. un-binned:   high-dim feature space partitioned without rectangular bins retain full multi-dim info of data 4 A. De Simone

  > Outline 1. Statistical test of dataset compatibility   • Nearest-Neighbors Two-Sample Test   • Identify Discrepancies   • Include Uncertainties 2. Applications to High-Energy Physics 5 A. De Simone

> Two-sample Test [a.k.a. “homogeneity test”] Two sets: iid Trial: T = { x 1 , . . . , x N T } ∼ p T i ∈ R D x i , x 0 iid Benchmark: B = { x 0 1 , . . . , x 0 N B } ∼ p B probability distributions p B ,p T unknown e.g.: simulated SM bkg real measured data 7 A. De Simone

> Two-sample Test Two sets: iid Trial: T = { x 1 , . . . , x N T } ∼ p T i ∈ R D x i , x 0 iid Benchmark: B = { x 0 1 , . . . , x 0 N B } ∼ p B probability distributions p B ,p T unknown Are B,T drawn from the same prob. distribution? easy… 8 A. De Simone

> Two-sample Test Two sets: iid Trial: T = { x 1 , . . . , x N T } ∼ p T i ∈ R D x i , x 0 iid Benchmark: B = { x 0 1 , . . . , x 0 N B } ∼ p B probability distributions p B ,p T unknown Are B,T drawn from the same prob. distribution? … hard! 9 A. De Simone

      > Two-sample Test RECIPE: 1. Density Estimator reconstruct PDFs from samples 2. Test Statistic (TS)   measure “distance” between PDFs   3. TS distribution associate probabilities to TS   under null hypothesis H 0 : p B = p T   4. p -value   accept/reject H 0 10 A. De Simone

> 1. Density Estimator Divide the space in squared bins? ✓ easy B ✓ can use simple statistics (e.g. )   χ 2 ✘ hard/slow/impossible in high- D Need un-binned   multivariate approach p B ( x ) , ˆ ˆ p T ( x ) Find PDFs estimators :   e.g. based on densities of points: T p B,T ( x ) = ρ B,T ( x ) ˆ N B,T Nearest Neighbors! [Schilling - 1986][Henze - 1988] [Wang et al. - 2005,2006] [Dasu et al. - 2006][Perez-Cruz - 2008] [Sugiyama et al. - 2011][Kremer et al, 2015] 11 A. De Simone

> 1. Density Estimator • Fix integer K.   B • Choose query point x j in T and   draw it in B.   x j T x j 12 A. De Simone

> 1. Density Estimator • Fix integer K.   B • Choose query point x j in T and   draw it in B.   x j r j,B • Find the distance r j,B of the   K th -NN of x j in B.   T x j 13 A. De Simone

> 1. Density Estimator • Fix integer K.   B • Choose query point x j in T and   draw it in B.   x j r j,B • Find the distance r j,B of the   K th -NN of x j in B.   • Find the distance r j,T of the   K th -NN of x j in T.   T r j,T x j 14 A. De Simone

> 1. Density Estimator • Fix integer K.   B • Choose query point x j in T and   draw it in B.   x j r j,B • Find the distance r j,B of the   K th -NN of x j in B.   • Find the distance r j,T of the   K th -NN of x j in T.   T • Estimate PDFs: r j,T x j 1 K p B ( x j ) ˆ = ω D r D N B j,B 1 K p T ( x j ) ˆ = ω D r D N T − 1 j,T 15 A. De Simone

      > 2. Test Statistic • Measure of the “distance” between 2 PDFs   N T 1 log ˆ p T ( x j ) X • Define Test Statistic :   TS( B , T ) = p B ( x j ) ˆ N T (detect under-/over-densities) j =1 TS( B , T ) = ˆ • Related to Kullback-Leibler divergence as:   D KL (ˆ p T || ˆ p B ) R D p ( x ) log p ( x ) Z D KL ( p || q ) ≡ q ( x ) d x N T TS obs = D log r j,B N B X • From NN-estimated PDFs:   + log N T − 1 N T r j,T j =1 • Theorem: this estimator converges to D KL ( p B || p T ),   in large sample limit [Wang et al. - 2005,2006] 16 A. De Simone

> 3. Test Statistic Distribution Permutation test! How is TS distributed? Assume p B =p T . Union set: U = T ∪ B T e U Random reshuffle T Compute the test   statistic TS n on: ( ˜ B , ˜ T ) e B B Repeat many times. f (TS | H 0 ) ← { TS n } Distribution of TS under H 0 : [asymptotically normal with zero mean] 17 A. De Simone

            > 4. p -value • mean,variance of TS distribution   f (TS | H 0 ) µ, ˆ ˆ σ : TS → TS 0 ≡ TS − ˆ µ • Standardize the TS:   ˆ σ f 0 (TS 0 | H 0 ) = ˆ σ TS 0 | H 0 ) σ f (ˆ µ + ˆ • TS’ distributed according to   • Two-sided p -value:   Z + 1 f 0 (TS 0 | H 0 ) d TS 0 p = 2 | TS 0 obs | • Equivalent significance: Z ≡ Φ − 1 (1 − p/ 2) 18 A. De Simone

> 2D Gaussian Example ✓ 1 ◆ 0 p B = N ( µ B , Σ B ) p T = N ( µ T , Σ T ) Σ B = Σ T = 0 1 ✓ 1 . 0 ◆ ✓ 1 . 2 ◆ µ B = µ T = 1 . 0 1 . 2 exact KL   divergence ✓ 1 . 0 ◆ ✓ 1 . 15 ◆ µ B = µ T = 1 . 0 1 . 15 K = 5 , N perm = 1000 more data, more power 19 A. De Simone

> NN2ST: Summary INPUT: iid i ∈ R D x i , x 0 T = { x 1 , . . . , x N T } Trial sample: ∼ p T p B ,p T unknown iid Benchmark sample: B = { x 0 1 , . . . , x 0 N B } ∼ p B K : number of nearest neighbors N perm : number of permutations OUTPUT:   p -value of the null hypothesis H 0 : p B = p T [check compatibility between 2 samples] 20 A. De Simone

> NN2ST: Summary Test Statistic Benchmark sample TS obs y t i s n n e o d i t a N m N i - t s K e o i t a r p Trial sample e r m u t a t i o n t e s t -|TS obs | |TS obs | p value TS distribution Python code: github.com/de-simone/NN2ST 21 A. De Simone

              > Where are the discrepancies? Bonus: Characterize regions with significant discrepancies Z ( x j ) ≡ u ( x j ) − ¯ u u ( x j ) ≡ log r j,B 1. “Score” field over T : with:   r j,T σ u TS obs = D ¯ u + const Z x Z ( x ) > c 2. Identify points where   They contribute the most to large TS obs high-discrepancy (anomalous) regions   3. Apply a clustering algorithm to group them 23 A. De Simone

          > Sample Uncertainties How to include sample uncertainties? B 1. Model feature uncertainties   F B ( x ) , F T ( x ) [e.g. zero-mean gaussians]   2. New samples by adding random noise   sampled from F B,T :   { x i + ∆ x i } N T T u = i =1 T i } N B { x 0 i + ∆ x 0 B u = i =1 3. Compute TS on new samples   TS u ≡ TS( B u , T u ) = TS obs + U 4. Repeat many times to reconstruct f(U) 25 A. De Simone

  > Sample Uncertainties How to include sample uncertainties? • f(TS u ) is a convolution:   f (TS u | H 0 ) = f (TS | H 0 ) ∗ f ( U ) f(TS u ) more spread than f(TS)   • p -value computed from f(TS u )     • weaker significance,   power degradation TS obs 26 A. De Simone

> 2D Gaussian with Uncertainties B,T gaussian samples: gaussian uncorrelated errors (diagonal covariance) p B = N ( µ B , Σ B ) p T = N ( µ T , Σ T ) with fixed relative uncertainty ✓ 1 . 0 ◆ ✓ 1 . 15 ◆ µ B = µ T = 1 . 0 1 . 15 � i = ✏ x i ✓ ◆ 1 0 for each feature component i Σ B = Σ T = 0 1 27 A. De Simone

  > NN2ST: Summary ✓ general, model-independent   ✓ fast, no optimization   [ N B,T =20k, K =5, N perm =1k, D =2: t ~ 2 mins N B,T =20k, K =5, N perm =1k, D =8: t ~ 50 mins ] ✓ sensitive to unspecified signals   ✓ useful when no variable can separate sig/bkg ✓ helps finding signal regions, optimal cuts, …   ✓ flexible to incorporate uncertainties   ✘ need to run for each sample pair   ✘ permutation test is bottleneck 28 A. De Simone

Guiding New Physics Searches with Unsupervised Learning [DS, - PowerPoint PPT Presentation

IML Working Group, CERN 2018-10-12 Guiding New Physics Searches with Unsupervised Learning [DS, Jacques - 1807.06038] Andrea De Simone andrea.desimone@sissa.it > New Physics ? Searches for New Physics

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Searches with a Searches with a Disappearing-Track Signature Disappearing-Track Signature Andy

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

The impact of rare K decays in The impact of rare K decays in New Physics searches New Physics

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Scaling Saved Searches Serving real time push-notifications for millions saved searches Who are

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Change detection in multi-dimensional datasets and time series Andrea De Simone

University of Amsterdam and Euvision Technologies at ILSVRC2013 Koen van de Sande Daniel

Welcome Natural Capital Expedition Built Environment Anglique Laskewitz, VBDO Liuzhou

Dealing with uncertainty in railway traffic management and disruption management April 26,

MAKING RESEARCH ON SYMMETRIC FUNCTIONS USING MUPAD-COMBINAT Francois Descouens Laboratoire

Targeted Proteomics Environment Status of the Skyline open-source software project six years

EVIDENTIAL STATISTICS Reforming the Introductory Course in Applied Statistics for Non-Majors

qqQ charmonium threshold states and QQq Q potentials Gunnar Bali Universitt Regensburg