machine learning a very brief introduction
play

Machine learning a very brief introduction Jaime Norman LPSC - PowerPoint PPT Presentation

Machine learning a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018 Outline I will briefly outline the motivation to use machine learning to solve physics problems We


  1. Machine learning – a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018

  2. Outline • I will briefly outline the motivation to use machine learning to solve physics problems • We can then go through 2 examples using Toolkit for MultiVariate Analysis (TMVA) – Toy dataset example – Charmed baryon example • I have tried to give an intuitive picture of how – no detail about any methods – Very good summary: P. Bhat, Multivariate analysis methods in Particle Physics

  3. Introduction • Many physics experiments (not just particle physics) search for rare signals • Often the main challenge is to extract the signal from the huge background arising from other (uninteresting) physics processes • Information from different detectors gives us features by which we can distinguish ‘signal’ from ‘background’ – Particle identification – Transverse momentum – Other kinematic / topological properties (relative angles, displaced vertices, more complex variables … ) • Knowledge of physics of background/signal crucial

  4. Cut optimisation • The simplest way to try to remove background is by performing 1-dimensional cuts on features – E.g. PID – Nσ cut on dE/dx – pT of tracks • Signal candidates passing cut are kept, while others are rejected • Often not optimal! Especially if we are using many variables, which most likely have more complex, non-linear correlation • How can we optimise our selection?

  5. Multivariate approach • Represent a dataset in terms of feature variables , or vector x = (x 1 , x 2 , x 3 , … , x n ) • Given a vector of features, we want to know (for example) the probability of an entry in our dataset of being signal or background • Construct function y = f(x) which maps feature space into a form which is constructed to be useful for classification – That is f provides map ℜ d à ℜ N – Preferable to have dimensionality N << d • In practice: dataset is finite, and functional form of data is unknown – approximate function f(x,w) is learned

  6. e.g. top quark mass measurement, D0 ℜ d à ℜ n

  7. Supervised learning • Using a training dataset of known output (signal/background) to approximate the function is known as supervised learning – Classification if f(x,w) is discrete ( binary classification if classifying into 2 classes • E.g. identifying higgs decay from other SM processes – Regression if f(x,w) is continuous • E.g. functional form of TPC dE/dx curve (as function of many variables) – see M. Ivanov talk • Also unsupervised learning, reinforced learning

  8. example: Gaussian, linear correlated variables

  9. example: Gaussian, linear correlated variables

  10. Example: Boosted decision trees • Decision trees employ sequential cuts to perform classification • ‘Variable’ space split into partitions, and mapped onto one-dimensional classifier • Selection on classifier corresponds to decision boundary in feature space • Boosted decision trees: create many small trees, and combine - reduce misbehaviour due to fluctuations ✓ Can often perform more optimally than ‘standard’ rectangular cuts ✓ Deals with lots of input data very well – automatic selection of strongly discriminating features ✓ ‘Algorithm-of-choice’ for many other collaborations • Top quark mass[1], Higgs discovery[2], B s0 —>µµ[3] … [1] Phys. Rev. D.58,052001 (1998) [2] Phys. Lett. B 716 (2012) 30 [3] Nature 522, 68-72 (04 June 2015) ! X

  11. Signal probability

  12. Example: Λ C 0.6 Normalised counts Normalised counts Signal 6 < p < 8 GeV/c 0.5 0.5 Background T Data (sidebands) This Thesis 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 p p (GeV/c) p K (GeV/c) T T Normalised counts 35 Normalised counts 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.75 0.8 0.85 0.9 0.95 1 0 0.02 0.04 0.06 0.08 0.1 0.12 cosine pointing angle Decay Length (cm) (1/N) dN/dx 3.5 This Thesis signal 6 < p < 8 GeV/c 3 T background p-Pb, s = 5.02 TeV data NN 2.5 2 1.5 1 0.5 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 − − − − − BDT response

  13. Validation • Just because you have a trained model using the most state-of-the- art, high performing ML algorithm, it doesn’t mean the output is right! – Training data must be accurate representation of real data – Trained model must be tested using independent dataset • Overfitting can occur

  14. Testing TMVA overtraining check for classifier: BDT_pt2to3 4.5 dx Signal (test sample) Signal (training sample) / Background (test sample) Background (training sample) (1/N) dN 4 Kolmogorov-Smirnov test: signal (background) probability = 0 (0.083) 3.5 3 U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% 2.5 2 1.5 1 0.5 0 0.4 0.2 0 0.2 0.4 − − BDT_pt2to3 response

  15. MC/Data comparison 0.6 Normalised counts Normalised counts Signal 6 < p < 8 GeV/c 0.5 0.5 Background T Data (sidebands) This Thesis 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 p p (GeV/c) p K (GeV/c) T T Normalised counts 35 Normalised counts 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.75 0.8 0.85 0.9 0.95 1 0 0.02 0.04 0.06 0.08 0.1 0.12 cosine pointing angle Decay Length (cm) • Often we don’t have a ‘pure’ signal sample – Can be difficult to evaluate agreement with MC • We can always expect some MC/data difference – should enter in the systematic uncertainty evaluation

  16. Tutorial • Now we can try the tutorial • https :// dfs . cern . ch / dfs / websites / j / jnorman / mvatutorial / • Download https :// cernbox . cern . ch / index . php / s / RvIESWYQF1u5zNI

  17. References • Figures/info taken from P. Bhat, Multivariate analysis methods in Particle Physics, DOI: 10.1146/annurev.nucl.012809.104427

  18. Towards a unified framework for ML in ALICE G.M.Innocenti

  19. G.M.Innocenti

  20. G.M.Innocenti

  21. Quark vs. gluon jet tagging D mesons

  22. • See https://indico.cern.ch/event/766450/ contributions/3225284/attachments/ 1765169/2865695/20181204_WorkshowQA.p df#search=gian%20michele for more info

  23. Backup

  24. Λ C hadronic decay reconstruction • PID using TPC via d E /d x and TOF via time of flight measurement • n σ cuts, or Bayesian approach to identify particles • Cuts on decay topologies exploiting decay vertex displacement from primary vertex • Signal extraction via invariant mass distribution • Feed-down (b) subtracted using pQCD-based estimation of charmed baryon production • Correct for e ffi ciency + normalisation ! X

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend