Machine learning a very brief introduction Jaime Norman LPSC - - PowerPoint PPT Presentation
Machine learning a very brief introduction Jaime Norman LPSC - - PowerPoint PPT Presentation
Machine learning a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018 Outline I will briefly outline the motivation to use machine learning to solve physics problems We
Outline
- I will briefly outline the motivation to use
machine learning to solve physics problems
- We can then go through 2 examples using
Toolkit for MultiVariate Analysis (TMVA)
– Toy dataset example – Charmed baryon example
- I have tried to give an intuitive picture of how
– no detail about any methods
– Very good summary: P. Bhat, Multivariate analysis methods in Particle Physics
Introduction
- Many physics experiments (not just particle physics)
search for rare signals
- Often the main challenge is to extract the signal from
the huge background arising from other (uninteresting) physics processes
- Information from different detectors gives us features
by which we can distinguish ‘signal’ from ‘background’
– Particle identification – Transverse momentum – Other kinematic / topological properties (relative angles, displaced vertices, more complex variables…)
- Knowledge of physics of background/signal crucial
Cut optimisation
- The simplest way to try to
remove background is by performing 1-dimensional cuts
- n features
– E.g. PID – Nσ cut on dE/dx – pT of tracks
- Signal candidates passing cut
are kept, while others are rejected
- Often not optimal! Especially if
we are using many variables, which most likely have more complex, non-linear correlation
- How can we optimise our
selection?
Multivariate approach
- Represent a dataset in terms of feature variables , or
vector x = (x1, x2, x3,…, xn)
- Given a vector of features, we want to know (for
example) the probability of an entry in our dataset of being signal or background
- Construct function y = f(x) which maps feature space
into a form which is constructed to be useful for classification
– That is f provides map ℜd à ℜN – Preferable to have dimensionality N << d
- In practice: dataset is finite, and functional form of
data is unknown – approximate function f(x,w) is learned
ℜd à ℜn e.g. top quark mass measurement, D0
Supervised learning
- Using a training dataset of known output
(signal/background) to approximate the function is known as supervised learning
– Classification if f(x,w) is discrete (binary classification if classifying into 2 classes
- E.g. identifying higgs decay from other SM processes
– Regression if f(x,w) is continuous
- E.g. functional form of TPC dE/dx curve (as function of
many variables) – see M. Ivanov talk
- Also unsupervised learning, reinforced
learning
example: Gaussian, linear correlated variables
example: Gaussian, linear correlated variables
Example: Boosted decision trees
- Decision trees employ sequential cuts to
perform classification
- ‘Variable’ space split into partitions, and
mapped onto one-dimensional classifier
- Selection on classifier corresponds to
decision boundary in feature space
- Boosted decision trees: create many small
trees, and combine - reduce misbehaviour due to fluctuations ✓Can often perform more optimally than ‘standard’ rectangular cuts ✓Deals with lots of input data very well – automatic selection of strongly discriminating features ✓‘Algorithm-of-choice’ for many other collaborations
- Top quark mass[1], Higgs discovery[2],
Bs0—>µµ[3] …
!X
[1] Phys. Rev. D.58,052001 (1998) [2] Phys. Lett. B 716 (2012) 30 [3] Nature 522, 68-72 (04 June 2015)
Signal probability
Example: ΛC
p (GeV/c)
T
p 1 2 3 4 5 6 7 Normalised counts 0.1 0.2 0.3 0.4 0.5 0.6
Signal Background Data (sidebands)
K (GeV/c)
T
p 1 2 3 4 5 6 Normalised counts 0.1 0.2 0.3 0.4 0.5
< 8 GeV/c
T
6 < p This Thesis
cosine pointing angle 0.75 0.8 0.85 0.9 0.95 1 Normalised counts 5 10 15 20 25 30 35 Decay Length (cm) 0.02 0.04 0.06 0.08 0.1 0.12 Normalised counts 5 10 15 20 25 30 35
BDT response
1 − 0.8 − 0.6 − 0.4 − 0.2 − 0.2 0.4 0.6 0.8 1
(1/N) dN/dx
0.5 1 1.5 2 2.5 3 3.5
signal background data
This Thesis < 8 GeV/c
T
p 6 < = 5.02 TeV
NN
s p-Pb,
Validation
- Just because you have
a trained model using the most state-of-the- art, high performing ML algorithm, it doesn’t mean the
- utput is right!
– Training data must be accurate representation of real data – Trained model must be tested using independent dataset
- Overfitting can occur
Testing
BDT_pt2to3 response
0.4 − 0.2 − 0.2 0.4
dx / (1/N) dN
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Signal (test sample) Background (test sample) Signal (training sample) Background (training sample)
Kolmogorov-Smirnov test: signal (background) probability = 0 (0.083)
U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)%
TMVA overtraining check for classifier: BDT_pt2to3
MC/Data comparison
- Often we don’t have a ‘pure’ signal sample
– Can be difficult to evaluate agreement with MC
- We can always expect some MC/data difference – should
enter in the systematic uncertainty evaluation
p (GeV/c)
T
p 1 2 3 4 5 6 7 Normalised counts 0.1 0.2 0.3 0.4 0.5 0.6
Signal Background Data (sidebands)
K (GeV/c)
T
p 1 2 3 4 5 6 Normalised counts 0.1 0.2 0.3 0.4 0.5
< 8 GeV/c
T
6 < p This Thesis
cosine pointing angle 0.75 0.8 0.85 0.9 0.95 1 Normalised counts 5 10 15 20 25 30 35 Decay Length (cm) 0.02 0.04 0.06 0.08 0.1 0.12 Normalised counts 5 10 15 20 25 30 35
Tutorial
- Now we can try the tutorial
- https://dfs.cern.ch/dfs/websites/j/
jnorman/mvatutorial/
- Download
https://cernbox.cern.ch/index.php/s/ RvIESWYQF1u5zNI
References
- Figures/info taken from P. Bhat, Multivariate
analysis methods in Particle Physics, DOI: 10.1146/annurev.nucl.012809.104427
Towards a unified framework for ML in ALICE
G.M.Innocenti
G.M.Innocenti
G.M.Innocenti
D mesons Quark vs. gluon jet tagging
- See
https://indico.cern.ch/event/766450/ contributions/3225284/attachments/ 1765169/2865695/20181204_WorkshowQA.p df#search=gian%20michele for more info
Backup
ΛC hadronic decay reconstruction
!X
- PID using TPC via dE/dx and
TOF via time of flight measurement
- nσ cuts, or Bayesian
approach to identify particles
- Cuts on decay topologies
exploiting decay vertex displacement from primary vertex
- Signal extraction via invariant
mass distribution
- Feed-down (b) subtracted
using pQCD-based estimation of charmed baryon production
- Correct for efficiency +
normalisation