Introduction to TMVA and Primary Electron Track Determination Erin - - PowerPoint PPT Presentation

introduction to tmva and primary electron track
SMART_READER_LITE
LIVE PREVIEW

Introduction to TMVA and Primary Electron Track Determination Erin - - PowerPoint PPT Presentation

Introduction to TMVA and Primary Electron Track Determination Erin Conley SNB/LE Working Group Meeting June 20, 2018 6/20/2018 1 Introduction 30.25 MeV Event Display: Time (ticks) vs. Wire in CC interactions Goal: determine


slide-1
SLIDE 1

Introduction to TMVA and Primary Electron Track Determination

Erin Conley SNB/LE Working Group Meeting June 20, 2018

1 6/20/2018

slide-2
SLIDE 2

2 6/20/2018

Introduction

  • Goal: determine primary

electron (reconstructed) track in 𝜉𝑓CC interactions

– Not always obvious; having a concrete, general method would be useful! – Using MARLEY events made by

  • J. Stock in May 2017
  • TMVA provides methods based
  • n machine learning to help

reach this goal.

30.25 MeV Event Display: Time (ticks) vs. Wire

slide-3
SLIDE 3

3

TMVA: Introduction

  • TMVA: Toolkit for Multivariate Data Analysis

– Framework in ROOT to be used for classification and regression problems – Various multivariate analysis (MVA) methods available

  • Two independent phases:

– Training phase: MVA methods trained, tested, evaluated – Application phase: chosen MVA methods applied to classification problem

  • Need to worry about overtraining: too few degrees of freedom leads

to unrealistic increase in classification performance

  • Data pre-processing available to, e.g., de-correlate or “Gaussian-

ize” variables

6/20/2018

slide-4
SLIDE 4

6/20/2018 4

TMVA Output

Characteristics about input variables:

  • Distributions for signal, background

input variables

  • Distributions for transformed variables

(e.g., decorrelated variables)

  • Correlation plots + matrix to

understand linear correlations between variables Use these plots to choose optimal combinations of variables, data pre- processing strategy, etc. MVA method performance plots:

  • MVA method classifier outputs

– Kolmogorov-Smirnov test statistic to determine whether overtraining occurred (rule

  • f thumb: want 𝜍𝐿𝑇 ≳ 0.01)
  • Optimal cut for MVA method classifiers
  • Classification probabilities + PDFs
  • Probability integral transformation

(rarity)

  • Receiver operation characteristics

(ROC) curves Use these plots to compare MVA method performances, choose optimal cuts on data, etc.

slide-5
SLIDE 5

5

MARLEY Simulations: Preparing for TMVA

  • Used BackTracker to determine which tracks were made by

primary electron

– Used 2D hits associated with tracks – Multiple tracks in the event can be made by the primary electron – Some tracks are partially made by primary electrons (e.g., track with 10 hits → primary electron produced 6 hits)

  • For the purposes of preliminary TMVA tests:

– Signal: tracks that had 75% or more of its hits produced by the primary electron – Background: all other tracks

  • Used full 30.25 MeV MARLEY simulation (10,000 events)

6/20/2018

slide-6
SLIDE 6

6

Determining Primary Track “By Eye”

  • Scanned event displays of 100 events in 30.25 MeV MARLEY data

– These events had 2.61 reconstructed tracks on average (pmtracktc)

  • Out of the 100 events…

– 2 events had no reconstructed tracks – 2 events I failed to identify the primary track – 10 events I correctly identified at least one primary track but…

  • Misidentified another track
  • Failed to identify all primary tracks

– 86 events I correctly identified all primary tracks

  • Out of the 86 events where I was 100% correct…

– 14 events contained one track

6/20/2018

slide-7
SLIDE 7

7

Variables Used from MARLEY Simulations

1. Track length: as given in the recob::Track object 2. “Charge deposition”: Sum of integral values of all hits in a track

– recob::Hit::Integral(): integral under calibrated signal waveform

3. “Path time”: difference between max/min peak times in the track

– recob::Hit::PeakTime(): time of signal peak (ticks)

4. “Summed RMS”: sum of RMS of all hits in a track

– recob::Hit::RMS(): RMS of hit shape (ticks)

  • Also used calorimetry information from tracks:

5. “Summed dQdx”: sum of dQdx values on collection plane 6. “Calo KE”: kinetic energy of track on collection plane – Potential issue: not all tracks have calorimetry information (bug?)

6/20/2018

slide-8
SLIDE 8

8

Input Variable Distributions

5 10 15 20 25

length

0.1 0.2 0.3 0.4 0.5 0.6

0.702 / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable: length 20 40 60 80 100 120 140 160

timeofint

0.02 0.04 0.06 0.08 0.1

4.08 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable: timeofint 2000 4000 6000 8000 10000

chargedepo

0.0002 0.0004 0.0006 0.0008 0.001 0.0012 0.0014

290 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable: chargedepo 20 40 60 80 100 120 140 160 180

summedrms

0.02 0.04 0.06 0.08 0.1 0.12

4.42 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable: summedrms 1000 2000 3000 4000 5000 6000 7000 8000

summeddqdx

0.0002 0.0004 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 0.0018 0.002

203 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable: summeddqdx 20 40 60 80 100 120 140 160 180 200

caloke

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

5.28 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.1, 0.0)%

Input variable: caloke

Signal: tracks with 75-100% of their hits made by primary electron Background: all other tracks

6/20/2018

slide-9
SLIDE 9

9

Decorrelated Variable Distributions

2

  • 2

4 6 8

length (Deco)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

0.284 / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (0.2, 0.1)%

Input variable’Deco’-transformed : length 2

  • 2

4 6 8

timeofint (Deco)

0.2 0.4 0.6 0.8 1 1.2

0.316 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable’Deco’-transformed : timeofint

1 2 3 4 5 6 7

chargedepo (Deco)

0.5 1 1.5 2 2.5 3

0.189 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable’Deco’-transformed : chargedepo

4

  • 2
  • 2

4 6

summedrms (Deco)

0.2 0.4 0.6 0.8 1 1.2 1.4

0.285 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable’Deco’-transformed : summedrms

4

  • 2
  • 2

4 6 8

summeddqdx (Deco)

0.2 0.4 0.6 0.8 1 1.2 1.4

0.307 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

nput variable’Deco’-transformed : summeddqdx

1 2 3 4 5 6 7 8

caloke (Deco)

0.5 1 1.5 2 2.5 3 3.5 4 4.5

0.221 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.1, 0.0)%

Input variable’Deco’-transformed : caloke

6/20/2018

slide-10
SLIDE 10

10

Track Determination + TMVA

  • Number “signal” events: 10942
  • Number “background” events: 8736
  • Trained on 8442 signal, 6236 background events; tested on

2500 signal, 2500 background events

– Tried to minimize testing sample; the more training, the better!

  • Tested ~5 different MVA methods so far, including cut-based

analysis, likelihood estimator, boosted decision trees

– TMVA ranks MVA methods by best signal efficiency – Use ROC curve to determine MVA performance – Will only show BDT results (cut-based, likelihood results in backup)

6/20/2018

slide-11
SLIDE 11

11 6/20/2018

Boosted Decision Tree (BDT) Method

  • Structured like binary tree;

“yes/no” decisions taken on

  • ne variable at a time until stop

criterion reached

– Splits phase space into many regions → eventually classified as signal or background – Boosted: extends to several trees → “forest”

  • Purposes of track

determination: BDT with decorrelated variables

Schematic of decision tree: leaf nodes at bottom are labeled “signal” and “background” after binary splits are made; these labels depend on the majority of events that end up in nodes

slide-12
SLIDE 12

12

ROC Curve for TMVA Methods

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Signal efficiency

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Background rejection

MVA Method: BDT Likelihood Cuts

Background rejection versus Signal efficiency

ROC integral values:

  • BDT: 0.945
  • Likelihood: 0.942
  • Cuts: 0.934
  • Shows true positive rate

versus false positive rate for different possible cutoff points

  • Use ROC curve to

compare MVA performances

  • The larger the area/

integral, the better the performance

  • From the integral

values, we see that MVA methods are comparable, BDT is performing well!

6/20/2018

slide-13
SLIDE 13

13 6/20/2018

BDT Classifier Output + Cuts

0.6

  • 0.4
  • 0.2
  • 0.2

0.4 0.6

BDT response

0.5 1 1.5 2 2.5 3 3.5

dx / (1/N) dN

Signal (test sample) Background (test sample) Signal (training sample) Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.055 (0.032)

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA overtraining check for classifier: BDT

0.6

  • 0.4
  • 0.2
  • 0.2

0.4 0.6

Cut value applied on BDT output

0.2 0.4 0.6 0.8 1

Efficiency (Purity)

Signal efficiency Background efficiency

Signal purity Signal efficiency*purity S+B S/

For 1000 signal and 1000 background is S+B events the maximum S/ 27.77 when cutting at -0.06

Cut efficiencies and optimal cut value

5 10 15 20 25 30 Significance

  • TMVA convention: signal events at larger classifier

values, background at smaller

  • Note the KS test statistics are above 0.01; indicates no
  • vertraining occurred
  • Gives us an idea of our performance when we

apply BDT to other datasets (e.g., future MARLEY simulations)

slide-14
SLIDE 14

14 6/20/2018

TMVA BDT Probability + Rarity Plots

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Signal probability

2 4 6 8 10 12 14 16 18

dx / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (12.1, 0.2)%

TMVA probability for classifier: BDT

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Signal rarity

5 10 15 20 25 30

dx / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA Rarity for classifier: BDT

Both plots have the expected shapes – good sign that BDT is performing well!

ℛ 𝑧 = න

−∞ 𝑧

ො 𝑧 𝑧′ 𝑒𝑧′ ො 𝑧(𝑧): PDF of signal/background

slide-15
SLIDE 15

15

Takeaways + Next Steps

  • Preliminary results from TMVA are promising!
  • Next steps:

– Try other MVA methods:

  • Multidimensional likelihood estimators
  • Artificial neural networks
  • Predictive learning
  • Others?

– Try different definitions of “signal”, “background” events – Try different variables or different combinations of variables

  • Photon information?

– T est MVA results on other data (e.g., future MARLEY simulations)

6/20/2018

slide-16
SLIDE 16

Backup Slides

16 6/20/2018

slide-17
SLIDE 17

17

How TMVA Decorrelates Variables

  • Linear correlations taken into account by computing

square-root of covariance matrix, 𝐷′, where 𝐷 = 𝐷′ 2

– Covariance: measure of joint variability between two variables – TMVA diagonalizes (symmetric) covariance matrix: 𝐸 = 𝑇𝑈𝐷𝑇 → 𝐷′ = 𝑇 𝐸𝑇𝑈

  • Decorrelation: 𝒚 → 𝐷′ −1𝒚

– Only completed for linearly correlated, Gaussian-distributed variables – Can use TMVA to “Gaussian-ize” variables before decorrelation

6/20/2018

slide-18
SLIDE 18

18

How TMVA Gaussian-izes Variables

  • Two steps:

– Transform variable into uniform distribution using cumulative distribution function obtained from training data – Use inverse error function to transform uniform distribution into Gaussian shape with zero mean, unity width

6/20/2018

slide-19
SLIDE 19

19

Gaussian-ized Track Determination Variables

3

  • 2
  • 1
  • 1

2 3 4 5

length (Gauss)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.234 / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

nput variable’Gauss’-transformed : length 3

  • 2
  • 1
  • 1

2 3 4 5

timeofint (Gauss)

0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.234 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable’Gauss’-transformed : timeofint

3

  • 2
  • 1
  • 1

2 3 4 5

chargedepo (Gauss)

0.1 0.2 0.3 0.4 0.5 0.6

0.233 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

nput variable’Gauss’-transformed : chargedepo

3

  • 2
  • 1
  • 1

2 3 4 5

summedrms (Gauss)

0.1 0.2 0.3 0.4 0.5 0.6

0.232 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% Input variable’Gauss’-transformed : summedrms

3

  • 2
  • 1
  • 1

2 3 4 5

summeddqdx (Gauss)

0.1 0.2 0.3 0.4 0.5 0.6

0.231 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% Input variable’Gauss’-transformed : summeddqdx

3

  • 2
  • 1
  • 1

2 3 4 5

caloke (Gauss)

0.1 0.2 0.3 0.4 0.5 0.6

0.232 / (1/N) dN

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

Input variable’Gauss’-transformed : caloke

6/20/2018

slide-20
SLIDE 20

20

Other TMVA Methods Tested

  • Rectangular cut optimization (“Cuts”):

– Maximizes background rejection at given signal efficiency – Returns binary response (signal or background) from binary search trees for signal, background

  • Projective Likelihood Estimator (“Likelihood”):

– Method of maximum likelihood: build model out of PDFs

  • Reproduces input variables for signal/background

– Correlations among variables are ignored; called “naïve Bayes estimator”

6/20/2018

slide-21
SLIDE 21

21

MVA Classifier Output

1

  • 0.5
  • 0.5

1 1.5 2

Likelihood response

0.5 1 1.5 2 2.5 3

dx / (1/N) dN

Signal (test sample) Background (test sample) Signal (training sample) Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.274 (0.033)

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA overtraining check for classifier: Likelihood

Note that the “cuts” method does not contain an

  • vertraining check.

Note the KS test statistics are all above 0.01; indicates no overtraining occurred

6/20/2018

slide-22
SLIDE 22

22 6/20/2018

MVA Cut Efficiency Plots

0.2 0.4 0.6 0.8 1

Signal Efficiency

0.2 0.4 0.6 0.8 1

Efficiency (Purity)

Signal efficiency Background efficiency

Signal purity Signal efficiency*purity S+B S/

For 1000 signal and 1000 background is S+B events the maximum S/ 27.72 when cutting at 0.89 Method Cuts provides a bundle of cut selections, each tuned to a different signal efficiency. Shown is the purity for each cut selection.

Cut efficiencies and optimal cut value

5 10 15 20 25 30 Significance 1

  • 0.5
  • 0.5

1 1.5 2

Cut value applied on Likelihood output

0.2 0.4 0.6 0.8 1

Efficiency (Purity)

Signal efficiency Background efficiency

Signal purity Signal efficiency*purity S+B S/

For 1000 signal and 1000 background is S+B events the maximum S/ 27.80 when cutting at -0.15

Cut efficiencies and optimal cut value

5 10 15 20 25 30 Significance

Cuts Likelihood Gives us a way of comparing MVA performances (efficiency, purity) for different scenarios of signal, background events

slide-23
SLIDE 23

23 6/20/2018

TMVA Likelihood Probability, Rarity Plots

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Signal probability

2 4 6 8 10 12 14 16

dx / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (34.6, 0.8)%

TMVA probability for classifier: Likelihood

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Signal rarity

5 10 15 20 25

dx / (1/N) dN

Signal Background

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA Rarity for classifier: Likelihood

Note that the “cuts” method did not output probability or rarity plots.