Search for top Squarks Using Multivariate Methods Jonas Graw Max - - PowerPoint PPT Presentation

search for top squarks using multivariate methods
SMART_READER_LITE
LIVE PREVIEW

Search for top Squarks Using Multivariate Methods Jonas Graw Max - - PowerPoint PPT Presentation

Search for top Squarks Using Multivariate Methods Jonas Graw Max Planck Institute for Physics (Werner-Heisenberg-Institut) Thursday 20 th July, 2017 Motivation Standard Strategy: Cut-based analysis ATLAS Work in progress ATLAS Work in


slide-1
SLIDE 1

Search for top Squarks Using Multivariate Methods

Jonas Graw

Max Planck Institute for Physics (Werner-Heisenberg-Institut)

Thursday 20th July, 2017

slide-2
SLIDE 2

Motivation

Standard Strategy: Cut-based analysis

Significance 1 2 3 4 5 6 7 [GeV]

t ~

m 200 400 600 800 1000 1200 [GeV]

1

χ ∼

m 100 200 300 400 500 600 Significance 1 2 3 4 5 6 7

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Work in progress ATLAS SRA_TT

  • 1

= 13 TeV, 36.1 fb s

Significance 1 2 3 4 5 6 7 [GeV]

t ~

m 200 400 600 800 1000 1200 [GeV]

1

χ ∼

m 100 200 300 400 500 600 Significance 1 2 3 4 5 6 7

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Work in progress ATLAS SRB_TT

  • 1

= 13 TeV, 36.1 fb s

→ Try to look at multivariate methods using Monte Carlo-samples

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 2/12

slide-3
SLIDE 3

Machine Learning

Classification: Signal or Background Supervised learning: Training done with labeled simulated events Events divided into training and testing (e. g. 50%-50%) Overtraining (”learning by heart”) needs to be avoided

Model Machine Learning ATLAS data Labeled samples Training Predicted Label T esting

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 3/12

slide-4
SLIDE 4

Boosted Decision Tree (BDT)

Division into two processes: Signal and Background Decision which variable to take is done by exactly one discriminating variable (cut) Chosen discriminating variable gives the best possible signal background separation Boosting: Training of a new tree, for which falsely classified events get a bigger weight BDT response r(i) ∈ [−1, 1] of an event i: Classification measure dependent on the trees with limits

{

r(i) = +1 r(i) = −1

}

: All trees classify i as

{

signal background

}

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 4/12

slide-5
SLIDE 5

Multivariate Methods: Boosted Decision Tree

Boosted Decision Tree (BDT) utilized to optimize the usage of discriminating variables Training on region with 0ℓ, Emiss T

>

250 GeV, ≥ 4 Jets & ≥ 1 b-Jet

BDT response

0.4 − 0.3 − 0.2 − 0.1 − 0.1 0.2 0.3 0.4 0.5

(1/N) dN/dx 0.5 1 1.5 2 2.5 3 3.5 4

TMVA Overtraining Check for Classifier: BDT Signal (test sample) Background (test sample) Signal (training sample) Background (training sample)

BDT-response for MC-test- and MC-training data is in very good accordance → No overtraining Training for models with m˜

t ≥ 1TeV

reaches better significances than Training for model with m˜

t = 1TeV

and m˜

χ0 =1 GeV

ROC-Curve is an indicator of how good the training was The area under the curve is a good measure of the training

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 5/12

slide-6
SLIDE 6

Cut on BDT-respones

Cut on BDT response 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 Significance 2 4 6 8 10

Work in Progress ATLAS

  • 1

= 13 TeV, 36.1 fb s 1000 300 1000 1 1000 100 800 300 800 100 800 1 600 300 600 100 600 1 TMVAweighted

Partially very high significances Important variables: mT2, pT(top), Emiss

T

Cut on BDT-response ≥ 0,34

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 6/12

slide-7
SLIDE 7

Expected Significances in m˜

t-m˜

χ0

1-Parameter Space

Cut-based method BDT method

Significance 1 2 3 4 5 6 7 [GeV]

t ~

m 200 400 600 800 1000 1200 [GeV]

1

χ ∼

m 100 200 300 400 500 600 Significance 1 2 3 4 5 6 7

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Work in progress ATLAS SRA_TT

  • 1

= 13 TeV, 36.1 fb s

Significance 1 2 3 4 5 6 [GeV]

1

χ ∼

m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA

  • 1

= 13 TeV, 36.1 fb s BDT>0.34

Cut-based BDT

1.7σ 3.0σ ⇒ Expected significance of 3σ up to m˜

t = 1 TeV

⇒ Can we increase this by changing the BDT settings?

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 7/12

slide-8
SLIDE 8

Optimization of the BDT-Settings

Expected significances of a sample with m˜

t = 1 TeV and m˜

χ0 = 1 GeV

Significance of Signal InputConfig_TT_directTT_1000_1_a821_r7676_Input 2.6 2.8 3 3.2 3.4 3.6 3.8 4 MaxDepth 1 2 3 4 5 6 7 8 9 10 MinNodeSize 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Maximal Depth: How many different layers can an event surpass? Minimal Node Size: Fraction (%) of events required to be in a leaf

⇒ Are these really the best settings?

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 8/12

slide-9
SLIDE 9

Optimization of the BDT-Settings

Area under ROC-Curve of a sample with m˜

t = 1 TeV and m˜

χ0 = 1 GeV

ROC AUC 0.953 0.954 0.955 0.956 0.957 0.958 MaxDepth 1 2 3 4 5 6 7 8 9 10 MinNodeSize 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Internal ATLAS

Area under ROC-Curve tests against overtraining High Maximal Depth and Small Minimal Node Size minimizes ROC-area

⇒ Sign for overtraining!

Settings: MaxDepth = 4, MinNodeSize = 1.5%

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 9/12

slide-10
SLIDE 10

BDT Training for different models

Training for model with m˜

t ≥ 1 TeV

Training with all samples

Significance 1 2 3 4 5 6 [GeV]

1

χ ∼

m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA

  • 1

= 13 TeV, 36.1 fb s BDT>0.34

Significance 1 2 3 4 5 6 [GeV]

1

χ ∼

m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA

  • 1

= 13 TeV, 36.1 fb s BDT>0.36

Significant increase of the sensitive regions in the paramter space, especially in direction towards kinematic border

(

t = m˜

χ0 + mt

)

For large m˜

t: Training with MC-data with m˜ t ≥ 1 TeV only more promising

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 10/12

slide-11
SLIDE 11

Comparing against other MVA methods

Training for model with m˜

t = 1 TeV and m˜

χ0 = 100 GeV

BDT Support Vector Machine (SVM)

BDT response 0.4 − 0.3 − 0.2 − 0.1 − 0.1 0.2 dx / (1/N) dN 1 2 3 4 5 6

Signal (test sample) Background (test sample) Signal (training sample) Background (training sample) Kolmogorov-Smirnov test: signal (background) probability = 0 (0.084)

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA overtraining check for classifier: BDT SVM response 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 dx / (1/N) dN 1 2 3 4 5 6 7 8

Signal (test sample) Background (test sample) Signal (training sample) Background (training sample) Kolmogorov-Smirnov test: signal (background) probability = 0.507 (0.833)

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA overtraining check for classifier: SVM Significance 1 2 3 4 5 6 [GeV]

1

χ ∼

m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA

  • 1

= 13 TeV, 36.1 fb s BDT>0.17

Significance 1 2 3 4 5 6 [GeV]

1

χ ∼

m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA

  • 1

= 13 TeV, 36.1 fb s SVM>0.94

BDT achieves better significances

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 11/12

slide-12
SLIDE 12

Summary & Outlook

Started to look at MVA methods for increasing sensitivity for˜ t → 0ℓ analysis Training with several signal samples seems reasonable Current Steps: Checking dependency on BDT settings and BDT input variables Look at neural networks

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA 12/12

slide-13
SLIDE 13

BACKUP

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA

slide-14
SLIDE 14

Training for model with m˜

t = 1 TeV and m˜

χ0 = 1 GeV Significance 1 2 3 4 5 6 [GeV]

1

χ ∼

m 200 400 600 800 1000 1200 4-th parameter 100 200 300 400 500 600 Significance 1 2 3 4 5 6

b+W → +t; t

1

χ ∼ → t ~ ; t ~ t ~ → pp Internal ATLAS MVA

  • 1

= 13 TeV, 36.1 fb s BDT>0.14

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA B/U 1

slide-15
SLIDE 15

BDT response

After training: Compare true values yi with forecast si. wi: weights, with

i wi = 1

error fraction: e = ∑

i wi1si̸=yi

boost factor α = β · ln

(1−e

e

)

with β constant, usually β ∈ [0, 1] New weights: wi → wi · exp

( α · 1si̸=yi )

BDT response of event i: ri =

m αm

(

si

)

m

m αm

, where

(si)m = { 1

if tree m predicts signal

−1

if tree m predicts background

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA B/U 2

slide-16
SLIDE 16

Data-MC agreement

Entries / 0.02 units 1 10

2

10

3

10

4

10

5

10

6

10

t t Z+jets W+jets Single Top Diboson +V t t γ + t t Data (x10) )=(1000,1)

1

χ ∼

,m

1

t ~

(m Work in progress ATLAS

  • 1

= 13 TeV, 36.1 fb s TMVAweighted

BDT response 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 Data/MC 0.5 1 1.5

Data only for BDT-response < 0 Tiny deviation in normalization between data and MC-forecast

Entries / 50 GeV 1 10

2

10

3

10

4

10

5

10

6

10

7

10

t t Z+jets W+jets Single Top Diboson +V t t γ + t t Data )=(1000,1)

1

χ ∼

,m

1

t ~

(m Work in progress ATLAS

  • 1

= 13 TeV, 36.1 fb s BDT response < 0

[GeV]

miss T

E 300 400 500 600 700 800 900 Data/MC 0.5 1 1.5

Good agreement of physical

  • bservables in region

BDT response < 0

07/20/2017 Jonas Graw -˜ t → 0ℓ - MVA B/U 3