Machine learning a very brief introduction Jaime Norman LPSC - PowerPoint PPT Presentation

Machine learning – a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018

Outline • I will briefly outline the motivation to use machine learning to solve physics problems • We can then go through 2 examples using Toolkit for MultiVariate Analysis (TMVA) – Toy dataset example – Charmed baryon example • I have tried to give an intuitive picture of how – no detail about any methods – Very good summary: P. Bhat, Multivariate analysis methods in Particle Physics

Introduction • Many physics experiments (not just particle physics) search for rare signals • Often the main challenge is to extract the signal from the huge background arising from other (uninteresting) physics processes • Information from different detectors gives us features by which we can distinguish ‘signal’ from ‘background’ – Particle identification – Transverse momentum – Other kinematic / topological properties (relative angles, displaced vertices, more complex variables … ) • Knowledge of physics of background/signal crucial

Cut optimisation • The simplest way to try to remove background is by performing 1-dimensional cuts on features – E.g. PID – Nσ cut on dE/dx – pT of tracks • Signal candidates passing cut are kept, while others are rejected • Often not optimal! Especially if we are using many variables, which most likely have more complex, non-linear correlation • How can we optimise our selection?

Multivariate approach • Represent a dataset in terms of feature variables , or vector x = (x 1 , x 2 , x 3 , … , x n ) • Given a vector of features, we want to know (for example) the probability of an entry in our dataset of being signal or background • Construct function y = f(x) which maps feature space into a form which is constructed to be useful for classification – That is f provides map ℜ d à ℜ N – Preferable to have dimensionality N << d • In practice: dataset is finite, and functional form of data is unknown – approximate function f(x,w) is learned

e.g. top quark mass measurement, D0 ℜ d à ℜ n

Supervised learning • Using a training dataset of known output (signal/background) to approximate the function is known as supervised learning – Classification if f(x,w) is discrete ( binary classification if classifying into 2 classes • E.g. identifying higgs decay from other SM processes – Regression if f(x,w) is continuous • E.g. functional form of TPC dE/dx curve (as function of many variables) – see M. Ivanov talk • Also unsupervised learning, reinforced learning

example: Gaussian, linear correlated variables

Example: Boosted decision trees • Decision trees employ sequential cuts to perform classification • ‘Variable’ space split into partitions, and mapped onto one-dimensional classifier • Selection on classifier corresponds to decision boundary in feature space • Boosted decision trees: create many small trees, and combine - reduce misbehaviour due to fluctuations ✓ Can often perform more optimally than ‘standard’ rectangular cuts ✓ Deals with lots of input data very well – automatic selection of strongly discriminating features ✓ ‘Algorithm-of-choice’ for many other collaborations • Top quark mass[1], Higgs discovery[2], B s0 —>µµ[3] … [1] Phys. Rev. D.58,052001 (1998) [2] Phys. Lett. B 716 (2012) 30 [3] Nature 522, 68-72 (04 June 2015) ! X

Signal probability

Example: Λ C 0.6 Normalised counts Normalised counts Signal 6 < p < 8 GeV/c 0.5 0.5 Background T Data (sidebands) This Thesis 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 p p (GeV/c) p K (GeV/c) T T Normalised counts 35 Normalised counts 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.75 0.8 0.85 0.9 0.95 1 0 0.02 0.04 0.06 0.08 0.1 0.12 cosine pointing angle Decay Length (cm) (1/N) dN/dx 3.5 This Thesis signal 6 < p < 8 GeV/c 3 T background p-Pb, s = 5.02 TeV data NN 2.5 2 1.5 1 0.5 0 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 − − − − − BDT response

Validation • Just because you have a trained model using the most state-of-the- art, high performing ML algorithm, it doesn’t mean the output is right! – Training data must be accurate representation of real data – Trained model must be tested using independent dataset • Overfitting can occur

Testing TMVA overtraining check for classifier: BDT_pt2to3 4.5 dx Signal (test sample) Signal (training sample) / Background (test sample) Background (training sample) (1/N) dN 4 Kolmogorov-Smirnov test: signal (background) probability = 0 (0.083) 3.5 3 U/O-flow (S,B): (0.0, 0.0)% / (0.0, 0.0)% 2.5 2 1.5 1 0.5 0 0.4 0.2 0 0.2 0.4 − − BDT_pt2to3 response

MC/Data comparison 0.6 Normalised counts Normalised counts Signal 6 < p < 8 GeV/c 0.5 0.5 Background T Data (sidebands) This Thesis 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 p p (GeV/c) p K (GeV/c) T T Normalised counts 35 Normalised counts 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.75 0.8 0.85 0.9 0.95 1 0 0.02 0.04 0.06 0.08 0.1 0.12 cosine pointing angle Decay Length (cm) • Often we don’t have a ‘pure’ signal sample – Can be difficult to evaluate agreement with MC • We can always expect some MC/data difference – should enter in the systematic uncertainty evaluation

Tutorial • Now we can try the tutorial • https :// dfs . cern . ch / dfs / websites / j / jnorman / mvatutorial / • Download https :// cernbox . cern . ch / index . php / s / RvIESWYQF1u5zNI

References • Figures/info taken from P. Bhat, Multivariate analysis methods in Particle Physics, DOI: 10.1146/annurev.nucl.012809.104427

Towards a unified framework for ML in ALICE G.M.Innocenti

G.M.Innocenti

Quark vs. gluon jet tagging D mesons

• See https://indico.cern.ch/event/766450/ contributions/3225284/attachments/ 1765169/2865695/20181204_WorkshowQA.p df#search=gian%20michele for more info

Backup

Λ C hadronic decay reconstruction • PID using TPC via d E /d x and TOF via time of flight measurement • n σ cuts, or Bayesian approach to identify particles • Cuts on decay topologies exploiting decay vertex displacement from primary vertex • Signal extraction via invariant mass distribution • Feed-down (b) subtracted using pQCD-based estimation of charmed baryon production • Correct for e ffi ciency + normalisation ! X

Machine learning a very brief introduction Jaime Norman LPSC - PowerPoint PPT Presentation

Machine learning a very brief introduction Jaime Norman LPSC Grenoble Workshop on Heavy-flavour tagging, Inha University 06/12/2018 Outline I will briefly outline the motivation to use machine learning to solve physics problems We

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Hans Christiaan Haan Informal Apprenticeship Training Main points: Skills development of

public involvement in core outcome set development: qualitative study Lucy Brading PhD Student

Year 12 Parent Information Session Year 12 Parent Information Session What support is

Exploring the Use of TensorFlow to Predict Connection Table Information within Chemical Structures

Addressing the physical challenges of the young elite GAA player Marty Loughran Antrim Coaching

Analysis of the charmless decay B 0 in the LHCb experiment Diego Alejandro Roa Romero

N N N TYPICAL 450 STUDENT PK-8 PROGRAM OF REQUIREMENTS COST OF NEW STATE OF ART 450 STUDENT PK-8

ALABAMA By Sammy Caseres INTRODUCTION Hello reader my name is Sammy Caseres. I am going to