Machine learning a very brief introduction Jaime Norman LPSC - - PowerPoint PPT Presentation

machine learning a very brief introduction
SMART_READER_LITE
LIVE PREVIEW

Machine learning a very brief introduction Jaime Norman LPSC - - PowerPoint PPT Presentation

Machine learning a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018 Outline I will briefly outline the motivation to use machine learning to solve physics problems We


slide-1
SLIDE 1

Machine learning – a very brief introduction

Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018

slide-2
SLIDE 2

Outline

  • I will briefly outline the motivation to use

machine learning to solve physics problems

  • We can then go through 2 examples using

Toolkit for MultiVariate Analysis (TMVA)

– Toy dataset example – Charmed baryon example

  • I have tried to give an intuitive picture of how

– no detail about any methods

– Very good summary: P. Bhat, Multivariate analysis methods in Particle Physics

slide-3
SLIDE 3

Introduction

  • Many physics experiments (not just particle physics)

search for rare signals

  • Often the main challenge is to extract the signal from

the huge background arising from other (uninteresting) physics processes

  • Information from different detectors gives us features

by which we can distinguish ‘signal’ from ‘background’

– Particle identification – Transverse momentum – Other kinematic / topological properties (relative angles, displaced vertices, more complex variables…)

  • Knowledge of physics of background/signal crucial
slide-4
SLIDE 4

Cut optimisation

  • The simplest way to try to

remove background is by performing 1-dimensional cuts

  • n features

– E.g. PID – Nσ cut on dE/dx – pT of tracks

  • Signal candidates passing cut

are kept, while others are rejected

  • Often not optimal! Especially if

we are using many variables, which most likely have more complex, non-linear correlation

  • How can we optimise our

selection?

slide-5
SLIDE 5

Multivariate approach

  • Represent a dataset in terms of feature variables , or

vector x = (x1, x2, x3,…, xn)

  • Given a vector of features, we want to know (for

example) the probability of an entry in our dataset of being signal or background

  • Construct function y = f(x) which maps feature space

into a form which is constructed to be useful for classification

– That is f provides map ℜd à ℜN – Preferable to have dimensionality N << d

  • In practice: dataset is finite, and functional form of

data is unknown – approximate function f(x,w) is learned

slide-6
SLIDE 6

ℜd à ℜn e.g. top quark mass measurement, D0

slide-7
SLIDE 7

Supervised learning

  • Using a training dataset of known output

(signal/background) to approximate the function is known as supervised learning

– Classification if f(x,w) is discrete (binary classification if classifying into 2 classes

  • E.g. identifying higgs decay from other SM processes

– Regression if f(x,w) is continuous

  • E.g. functional form of TPC dE/dx curve (as function of

many variables) – see M. Ivanov talk

  • Also unsupervised learning, reinforced

learning

slide-8
SLIDE 8

example: Gaussian, linear correlated variables

slide-9
SLIDE 9

example: Gaussian, linear correlated variables

slide-10
SLIDE 10

Example: Boosted decision trees

  • Decision trees employ sequential cuts to

perform classification

  • ‘Variable’ space split into partitions, and

mapped onto one-dimensional classifier

  • Selection on classifier corresponds to

decision boundary in feature space

  • Boosted decision trees: create many small

trees, and combine - reduce misbehaviour due to fluctuations ✓Can often perform more optimally than ‘standard’ rectangular cuts ✓Deals with lots of input data very well – automatic selection of strongly discriminating features ✓‘Algorithm-of-choice’ for many other collaborations

  • Top quark mass[1], Higgs discovery[2],

Bs0—>µµ[3] …

!X

[1] Phys. Rev. D.58,052001 (1998) [2] Phys. Lett. B 716 (2012) 30 [3] Nature 522, 68-72 (04 June 2015)

slide-11
SLIDE 11
slide-12
SLIDE 12

Signal probability

slide-13
SLIDE 13

Example: ΛC

p (GeV/c)

T

p 1 2 3 4 5 6 7 Normalised counts 0.1 0.2 0.3 0.4 0.5 0.6

Signal Background Data (sidebands)

K (GeV/c)

T

p 1 2 3 4 5 6 Normalised counts 0.1 0.2 0.3 0.4 0.5

< 8 GeV/c

T

6 < p This Thesis

cosine pointing angle 0.75 0.8 0.85 0.9 0.95 1 Normalised counts 5 10 15 20 25 30 35 Decay Length (cm) 0.02 0.04 0.06 0.08 0.1 0.12 Normalised counts 5 10 15 20 25 30 35

BDT response

1 − 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 1

(1/N) dN/dx

0.5 1 1.5 2 2.5 3 3.5

signal background data

This Thesis < 8 GeV/c

T

p 6 < = 5.02 TeV

NN

s p-Pb,

slide-14
SLIDE 14

Validation

  • Just because you have

a trained model using the most state-of-the- art, high performing ML algorithm, it doesn’t mean the

  • utput is right!

– Training data must be accurate representation of real data – Trained model must be tested using independent dataset

  • Overfitting can occur
slide-15
SLIDE 15

Testing

BDT_pt2to3 response

0.4 − 0.2 − 0.2 0.4

dx / (1/N) dN

0.5 1 1.5 2 2.5 3 3.5 4 4.5

Signal (test sample) Background (test sample) Signal (training sample) Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0 (0.083)

U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%

TMVA overtraining check for classifier: BDT_pt2to3

slide-16
SLIDE 16

MC/Data comparison

  • Often we don’t have a ‘pure’ signal sample

– Can be difficult to evaluate agreement with MC

  • We can always expect some MC/data difference – should

enter in the systematic uncertainty evaluation

p (GeV/c)

T

p 1 2 3 4 5 6 7 Normalised counts 0.1 0.2 0.3 0.4 0.5 0.6

Signal Background Data (sidebands)

K (GeV/c)

T

p 1 2 3 4 5 6 Normalised counts 0.1 0.2 0.3 0.4 0.5

< 8 GeV/c

T

6 < p This Thesis

cosine pointing angle 0.75 0.8 0.85 0.9 0.95 1 Normalised counts 5 10 15 20 25 30 35 Decay Length (cm) 0.02 0.04 0.06 0.08 0.1 0.12 Normalised counts 5 10 15 20 25 30 35

slide-17
SLIDE 17

Tutorial

  • Now we can try the tutorial
  • https://dfs.cern.ch/dfs/websites/j/

jnorman/mvatutorial/

  • Download

https://cernbox.cern.ch/index.php/s/ RvIESWYQF1u5zNI

slide-18
SLIDE 18

References

  • Figures/info taken from P. Bhat, Multivariate

analysis methods in Particle Physics, DOI: 10.1146/annurev.nucl.012809.104427

slide-19
SLIDE 19

Towards a unified framework for ML in ALICE

G.M.Innocenti

slide-20
SLIDE 20

G.M.Innocenti

slide-21
SLIDE 21

G.M.Innocenti

slide-22
SLIDE 22

D mesons Quark vs. gluon jet tagging

slide-23
SLIDE 23
  • See

https://indico.cern.ch/event/766450/ contributions/3225284/attachments/ 1765169/2865695/20181204_WorkshowQA.p df#search=gian%20michele for more info

slide-24
SLIDE 24

Backup

slide-25
SLIDE 25

ΛC hadronic decay reconstruction

!X

  • PID using TPC via dE/dx and

TOF via time of flight measurement

  • nσ cuts, or Bayesian

approach to identify particles

  • Cuts on decay topologies

exploiting decay vertex displacement from primary vertex

  • Signal extraction via invariant

mass distribution

  • Feed-down (b) subtracted

using pQCD-based estimation of charmed baryon production

  • Correct for efficiency +

normalisation