Search for top Squarks Using Multivariate Methods Jonas Graw Max - PowerPoint PPT Presentation

Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics (Werner-Heisenberg-Institut) Thursday 20 th July, 2017

Motivation Standard Strategy: Cut-based analysis ATLAS Work in progress ATLAS Work in progress ∼ ∼ ~ ~ ~ 0 ~ ~ ~ 0 -1 → → χ → -1 → → χ → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 1 7 7 7 7 [GeV] [GeV] 600 Significance Significance 600 Significance Significance SRA_TT SRB_TT 6 6 6 6 500 500 0 1 0 1 ∼ χ ∼ χ m m 5 5 5 5 400 400 4 4 4 4 300 300 3 3 3 3 200 200 2 2 2 2 100 100 1 1 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ~ ~ t t → Try to look at multivariate methods using Monte Carlo-samples 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 2/12

Machine Learning Classification: Signal or Background Supervised learning: Training done with labeled simulated events Events divided into training and testing (e. g. 50%-50%) Overtraining (”learning by heart”) needs to be avoided Labeled samples Training T esting Machine Learning ATLAS data Model Predicted Label 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 3/12

Boosted Decision Tree (BDT) Division into two processes: Signal and Background Decision which variable to take is done by exactly one discriminating variable (cut) Chosen discriminating variable gives the best possible signal background separation Boosting : Training of a new tree, for which falsely classified events get a bigger weight BDT response r ( i ) ∈ [ − 1 , 1] of an event i : Classification measure dependent on the trees with limits { } { } r ( i ) = +1 signal : All trees classify i as r ( i ) = − 1 background 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 4/12

Multivariate Methods: Boosted Decision Tree Boosted Decision Tree (BDT) utilized Training for models with m ˜ t ≥ 1TeV to optimize the usage of reaches better significances than discriminating variables Training for model with m ˜ t = 1TeV Training on region with 0 ℓ , E miss > T and m ˜ χ 0 = 1 GeV 250 GeV, ≥ 4 Jets & ≥ 1 b -Jet ROC-Curve is an indicator of how good the training was TMVA Overtraining Check for Classifier: BDT (1/N) dN/dx 4 Signal (test sample) Background (test sample) Signal (training sample) 3.5 Background (training sample) 3 2.5 2 1.5 1 0.5 0 − − − − 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 BDT response BDT-response for MC-test- and The area under the curve is a good MC-training data is in very good measure of the training accordance → No overtraining 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 5/12

Cut on BDT-respones Significance 10 ATLAS Work in Progress 8 -1 s = 13 TeV, 36.1 fb TMVAweighted 6 1000 300 1000 1 1000 100 800 300 800 100 800 1 600 300 600 100 600 1 4 2 0 − − − − 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 Cut on BDT response Partially very high significances Important variables: m T 2 , p T ( top ) , E miss T Cut on BDT-response ≥ 0,34 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 6/12

Expected Significances in m ˜ t - m ˜ 1 -Parameter Space χ 0 Cut-based method BDT method ATLAS Work in progress ATLAS Internal ∼ ~ ~ ~ ~ ~ ~ ∼ -1 → → χ 0 → -1 → → χ 0 → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 7 7 1 6 6 [GeV] Significance Significance 600 4-th parameter 600 Significance Significance SRA_TT 6 6 MVA 5 5 500 500 0 1 ∼ χ BDT>0.34 m 5 5 4 4 400 400 4 4 3 3 300 300 3 3 2 2 200 200 2 2 100 100 1 1 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ∼ ~ χ 0 t 1 Cut-based BDT 1 . 7 σ 3 . 0 σ ⇒ Expected significance of 3 σ up to m ˜ t = 1 TeV ⇒ Can we increase this by changing the BDT settings? 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 7/12

Optimization of the BDT-Settings Expected significances of a sample with m ˜ t = 1 TeV and m ˜ χ 0 = 1 GeV MinNodeSize Significance of Signal InputConfig_TT_directTT_1000_1_a821_r7676_Input 4 4.5 3.8 4 3.5 3.6 3 3.4 2.5 3.2 2 3 1.5 2.8 1 0.5 2.6 1 2 3 4 5 6 7 8 9 10 MaxDepth Maximal Depth : How many different layers can an event surpass? Minimal Node Size : Fraction (%) of events required to be in a leaf ⇒ Are these really the best settings? 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 8/12

Optimization of the BDT-Settings Area under ROC-Curve of a sample with m ˜ t = 1 TeV and m ˜ χ 0 = 1 GeV MinNodeSize ROC AUC 0.958 4.5 4 0.957 3.5 0.956 3 2.5 0.955 2 1.5 0.954 ATLAS Internal 1 0.953 0.5 1 2 3 4 5 6 7 8 9 10 MaxDepth Area under ROC-Curve tests against overtraining High Maximal Depth and Small Minimal Node Size minimizes ROC-area ⇒ Sign for overtraining! Settings: MaxDepth = 4 , MinNodeSize = 1 . 5 % 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 9/12

BDT Training for different models Training for model with Training with all samples m ˜ t ≥ 1 TeV ATLAS Internal ATLAS Internal ∼ ∼ ~ ~ ~ 0 ~ ~ ~ 0 -1 → → χ → -1 → → χ → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 6 6 1 6 6 4-th parameter 600 Significance Significance 4-th parameter 600 Significance Significance MVA MVA 5 5 5 5 500 500 BDT>0.34 BDT>0.36 4 4 4 4 400 400 3 3 3 3 300 300 2 2 2 2 200 200 100 1 1 100 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ∼ ∼ 0 0 χ χ 1 1 Significant increase of the sensitive regions in the paramter space, especially in direction towards kinematic border ( m ˜ t = m ˜ χ 0 + m t ) For large m ˜ t : Training with MC-data with m ˜ t ≥ 1 TeV only more promising 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 10/12

Comparing against other MVA methods Training for model with m ˜ t = 1 TeV and m ˜ χ 0 = 100 GeV Support Vector Machine (SVM) BDT TMVA overtraining check for classifier: BDT TMVA overtraining check for classifier: SVM dx dx 8 Signal (test sample) Signal (training sample) Signal (test sample) Signal (training sample) 6 / / Background (test sample) Background (training sample) Background (test sample) Background (training sample) (1/N) dN (1/N) dN 7 Kolmogorov-Smirnov test: signal (background) probability = 0 (0.084) Kolmogorov-Smirnov test: signal (background) probability = 0.507 (0.833) 5 6 4 5 U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% 4 3 3 2 2 1 1 0 0 − − − − 0.4 0.3 0.2 0.1 0 0.1 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 BDT response SVM response ATLAS Internal ATLAS Internal ∼ ∼ -1 → ~ ~ ~ → χ 0 → -1 → ~ ~ ~ → χ 0 → s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 6 6 1 6 6 4-th parameter 600 Significance Significance 4-th parameter 600 Significance Significance MVA MVA 5 5 5 5 500 500 BDT>0.17 SVM>0.94 4 4 4 4 400 400 3 3 3 3 300 300 2 2 2 2 200 200 100 1 1 100 1 1 0 0 0 0 200 400 600 800 1000 1200 200 400 600 800 1000 1200 m [GeV] m [GeV] ∼ ∼ 0 0 χ χ 1 1 BDT achieves better significances 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 11/12

Summary & Outlook Started to look at MVA methods for increasing sensitivity for ˜ t → 0 ℓ analysis Training with several signal samples seems reasonable Current Steps: Checking dependency on BDT settings and BDT input variables Look at neural networks 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA 12/12

BACKUP 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA

Training for model with m ˜ t = 1 TeV and m ˜ χ 0 = 1 GeV ATLAS Internal ∼ ~ ~ ~ → → χ 0 → -1 s = 13 TeV, 36.1 fb pp t t ; t +t; t b+W 1 6 6 4-th parameter Significance Significance 600 MVA 5 5 500 BDT>0.14 4 4 400 3 3 300 2 2 200 1 1 100 0 0 200 400 600 800 1000 1200 m [GeV] ∼ 0 χ 1 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA B/U 1

BDT response After training: Compare true values y i with forecast s i . w i : weights, with ∑ i w i = 1 error fraction: e = ∑ i w i 1 s i ̸ = y i ( 1 − e ) boost factor α = β · ln with β constant, usually β ∈ [0 , 1] e New weights: w i → w i · exp ( ) α · 1 s i ̸ = y i ( ) ∑ s i m α m BDT response of event i : r i = , where m ∑ m α m { 1 if tree m predicts signal ( s i ) m = if tree m predicts background − 1 07/20/2017 Jonas Graw - ˜ t → 0 ℓ - MVA B/U 2

Search for top Squarks Using Multivariate Methods Jonas Graw Max - PowerPoint PPT Presentation

Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics (Werner-Heisenberg-Institut) Thursday 20 th July, 2017 Motivation Standard Strategy: Cut-based analysis ATLAS Work in progress ATLAS Work in

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Search for top squarks with the ATLAS detector Nicolas Khler Max Planck Institute for Physics

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Outline DM811 Fall 2009 Heuristics for Combinatorial Optimization 1. Complete Search Methods

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Top squark search in the 1-lepton channel with CMS GDR Terascale October 30th, 2013 Alexandre

Direct Search Methods (nongradient methods) 1. Random search methods 2. Univariate method (one

CS 4100: Artificial Intelligence Perceptrons and Logistic Regression Jan-Willem van de Meent,

Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com www.jonathanmugan.com @jmugan April

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara

MIRA, SVM, k-NN Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values

x 2 > b w T x > 0 SPAM!! x ( x , 1) w 3 x 3 w T x + b ( w , b ) T ( x , 1)

First Observation of Single Top Quark Production at D Monica Pangilinan Brown University on

Bias/Variance Analysis for Network Data Jennifer Neville and David Jensen Knowledge Discovery

Particle identification using TMVA/MLP and Nave Bayes for EMC detector Malgorzata

Sambuz

Useful Links

Newsletter

Mail Us

Search for top Squarks Using Multivariate Methods Jonas Graw Max - PowerPoint PPT Presentation

Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics (Werner-Heisenberg-Institut) Thursday 20 th July, 2017 Motivation Standard Strategy: Cut-based analysis ATLAS Work in progress ATLAS Work in

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Search for top squarks with the ATLAS detector Nicolas Khler Max Planck Institute for Physics

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Outline DM811 Fall 2009 Heuristics for Combinatorial Optimization 1. Complete Search Methods

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Top squark search in the 1-lepton channel with CMS GDR Terascale October 30th, 2013 Alexandre

Direct Search Methods (nongradient methods) 1. Random search methods 2. Univariate method (one

CS 4100: Artificial Intelligence Perceptrons and Logistic Regression Jan-Willem van de Meent,

Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com www.jonathanmugan.com @jmugan April

Learning a Distance Metric for Structured Network Prediction Stuart Andrews and Tony Jebara

MIRA, SVM, k-NN Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values

x 2 &gt; b w T x &gt; 0 SPAM!! x ( x , 1) w 3 x 3 w T x + b ( w , b ) T ( x , 1)

First Observation of Single Top Quark Production at D Monica Pangilinan Brown University on

Bias/Variance Analysis for Network Data Jennifer Neville and David Jensen Knowledge Discovery

Particle identification using TMVA/MLP and Nave Bayes for EMC detector Malgorzata

Sambuz

Useful Links

Newsletter

Mail Us

x 2 > b w T x > 0 SPAM!! x ( x , 1) w 3 x 3 w T x + b ( w , b ) T ( x , 1)