Search for top Squarks Using Multivariate Methods
Jonas Graw
Max Planck Institute for Physics (Werner-Heisenberg-Institut)
Thursday 20th July, 2017
Search for top Squarks Using Multivariate Methods Jonas Graw Max - - PowerPoint PPT Presentation
Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics (Werner-Heisenberg-Institut) Thursday 20 th July, 2017 Motivation Standard Strategy: Cut-based analysis ATLAS Work in progress ATLAS Work in
Jonas Graw
Max Planck Institute for Physics (Werner-Heisenberg-Institut)
Thursday 20th July, 2017
Standard Strategy: Cut-based analysis
Significance 1 2 3 4 5 6 7 [GeV]
t ~
m 200 400 600 800 1000 1200 [GeV]
1
χ ∼
m 100 200 300 400 500 600 Significance 1 2 3 4 5 6 7
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Work in progress ATLAS SRA_TT
= 13 TeV, 36.1 fb s
Significance 1 2 3 4 5 6 7 [GeV]
t ~
m 200 400 600 800 1000 1200 [GeV]
1
χ ∼
m 100 200 300 400 500 600 Significance 1 2 3 4 5 6 7
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Work in progress ATLAS SRB_TT
= 13 TeV, 36.1 fb s
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 2/12
Classification: Signal or Background Supervised learning: Training done with labeled simulated events Events divided into training and testing (e. g. 50%-50%) Overtraining (”learning by heart”) needs to be avoided
Model Machine Learning ATLAS data Labeled samples Training Predicted Label T esting
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 3/12
Division into two processes: Signal and Background Decision which variable to take is done by exactly one discriminating variable (cut) Chosen discriminating variable gives the best possible signal background separation Boosting: Training of a new tree, for which falsely classified events get a bigger weight BDT response r(i) ∈ [−1, 1] of an event i: Classification measure dependent on the trees with limits
r(i) = +1 r(i) = −1
: All trees classify i as
signal background
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 4/12
Boosted Decision Tree (BDT) utilized to optimize the usage of discriminating variables Training on region with 0ℓ, Emiss T
250 GeV, ≥ 4 Jets & ≥ 1 b-Jet
BDT response
0.4 − 0.3 − 0.2 − 0.1 − 0.1 0.2 0.3 0.4 0.5
(1/N) dN/dx 0.5 1 1.5 2 2.5 3 3.5 4
TMVA Overtraining Check for Classifier: BDT Signal (test sample) Background (test sample) Signal (training sample) Background (training sample)
BDT-response for MC-test- and MC-training data is in very good accordance → No overtraining Training for models with m˜
t ≥ 1TeV
reaches better significances than Training for model with m˜
t = 1TeV
and m˜
χ0 =1 GeV
ROC-Curve is an indicator of how good the training was The area under the curve is a good measure of the training
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 5/12
Cut on BDT response 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 Significance 2 4 6 8 10
Work in Progress ATLAS
= 13 TeV, 36.1 fb s 1000 300 1000 1 1000 100 800 300 800 100 800 1 600 300 600 100 600 1 TMVAweighted
Partially very high significances Important variables: mT2, pT(top), Emiss
T
Cut on BDT-response ≥ 0,34
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 6/12
t-m˜
χ0
1-Parameter Space
Cut-based method BDT method
Significance 1 2 3 4 5 6 7 [GeV]
t ~
m 200 400 600 800 1000 1200 [GeV]
1
χ ∼
m 100 200 300 400 500 600 Significance 1 2 3 4 5 6 7
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Work in progress ATLAS SRA_TT
= 13 TeV, 36.1 fb s
Significance 1 2 3 4 5 6 [GeV]
1
χ ∼
m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA
= 13 TeV, 36.1 fb s BDT>0.34
Cut-based BDT
t = 1 TeV
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 7/12
Expected significances of a sample with m˜
t = 1 TeV and m˜
χ0 = 1 GeV
Significance of Signal InputConfig_TT_directTT_1000_1_a821_r7676_Input 2.6 2.8 3 3.2 3.4 3.6 3.8 4 MaxDepth 1 2 3 4 5 6 7 8 9 10 MinNodeSize 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Maximal Depth: How many different layers can an event surpass? Minimal Node Size: Fraction (%) of events required to be in a leaf
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 8/12
Area under ROC-Curve of a sample with m˜
t = 1 TeV and m˜
χ0 = 1 GeV
ROC AUC 0.953 0.954 0.955 0.956 0.957 0.958 MaxDepth 1 2 3 4 5 6 7 8 9 10 MinNodeSize 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Internal ATLAS
Area under ROC-Curve tests against overtraining High Maximal Depth and Small Minimal Node Size minimizes ROC-area
Settings: MaxDepth = 4, MinNodeSize = 1.5%
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 9/12
Training for model with m˜
t ≥ 1 TeV
Training with all samples
Significance 1 2 3 4 5 6 [GeV]
1
χ ∼
m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA
= 13 TeV, 36.1 fb s BDT>0.34
Significance 1 2 3 4 5 6 [GeV]
1
χ ∼
m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA
= 13 TeV, 36.1 fb s BDT>0.36
Significant increase of the sensitive regions in the paramter space, especially in direction towards kinematic border
m˜
t = m˜
χ0 + mt
For large m˜
t: Training with MC-data with m˜ t ≥ 1 TeV only more promising
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 10/12
Training for model with m˜
t = 1 TeV and m˜
χ0 = 100 GeV
BDT Support Vector Machine (SVM)
BDT response 0.4 − 0.3 − 0.2 − 0.1 − 0.1 0.2 dx / (1/N) dN 1 2 3 4 5 6
Signal (test sample) Background (test sample) Signal (training sample) Background (training sample) Kolmogorov-Smirnov test: signal (background) probability = 0 (0.084)
U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%
TMVA overtraining check for classifier: BDT SVM response 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 dx / (1/N) dN 1 2 3 4 5 6 7 8
Signal (test sample) Background (test sample) Signal (training sample) Background (training sample) Kolmogorov-Smirnov test: signal (background) probability = 0.507 (0.833)
U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%
TMVA overtraining check for classifier: SVM Significance 1 2 3 4 5 6 [GeV]
1χ ∼
m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6
b+W → +t; t
1χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA
= 13 TeV, 36.1 fb s BDT>0.17
Significance 1 2 3 4 5 6 [GeV]
1χ ∼
m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6
b+W → +t; t
1χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA
= 13 TeV, 36.1 fb s SVM>0.94
BDT achieves better significances
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 11/12
Started to look at MVA methods for increasing sensitivity for˜ t → 0ℓ analysis Training with several signal samples seems reasonable Current Steps: Checking dependency on BDT settings and BDT input variables Look at neural networks
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 12/12
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA
t = 1 TeV and m˜
χ0 = 1 GeV Significance 1 2 3 4 5 6 [GeV]
1
χ ∼
m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6
b+W → +t; t
1
χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA
= 13 TeV, 36.1 fb s BDT>0.14
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA B/U 1
After training: Compare true values yi with forecast si. wi: weights, with
i wi = 1
error fraction: e = ∑
i wi1si̸=yi
boost factor α = β · ln
e
with β constant, usually β ∈ [0, 1] New weights: wi → wi · exp
BDT response of event i: ri =
∑
m αm
si
m
∑
m αm
, where
if tree m predicts signal
if tree m predicts background
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA B/U 2
Entries / 0.02 units 1 10
2
10
3
10
4
10
5
10
6
10
t t Z+jets W+jets Single Top Diboson +V t t γ + t t Data (x10) )=(1000,1)
1χ ∼
,m
1t ~
(m Work in progress ATLAS
= 13 TeV, 36.1 fb s TMVAweighted
BDT response 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 Data/MC 0.5 1 1.5
Data only for BDT-response < 0 Tiny deviation in normalization between data and MC-forecast
Entries / 50 GeV 1 10
2
10
3
10
4
10
5
10
6
10
7
10
t t Z+jets W+jets Single Top Diboson +V t t γ + t t Data )=(1000,1)
1χ ∼
,m
1t ~
(m Work in progress ATLAS
= 13 TeV, 36.1 fb s BDT response < 0
[GeV]
miss T
E 300 400 500 600 700 800 900 Data/MC 0.5 1 1.5
Good agreement of physical
BDT response < 0
07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA B/U 3