MACHINE LEARNING Overview 1 MACHINE LEARNING Oral Presentations of - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING Overview 1

MACHINE LEARNING Oral Presentations of Projects Start at 9h15 am and last until 12h30 2

MACHINE LEARNING Exam Format The exam lasts a total of 40 minutes: -Upon entering the room, you pick at random 3 questions. You can leave one out! - Spend 20 minutes in the back of the room preparing answers to the two questions you have picked  When needed make schematic or prepare an example - Present for 20 minutes your answers on the black board. Exam is closed book but you can bring one A4 page with personal notes written recto-verso. 3

MACHINE LEARNING Example of exam question - I Exam questions will entail two parts: one conceptual and one algorithmic i. Explain SVM and give an example in which it could be applied ii. Discuss the different terms in the objective function of SVM. 4

MACHINE LEARNING Example of exam question - II Exam questions will entail two parts: one conceptual and one algorithmic ii. What are the pros and the cons of GPR compared to SVR ? ii. How can we derive GPR from linear probabilistic regression? 5

MACHINE LEARNING Example of exam question - III Exam questions may also tackle fundamental topics of ML ii. Give the formal definition of a pdf ii. What is good evaluation practice in ML? 6

MACHINE LEARNING Class Overview This overview is meant to cover solely some of the key concepts that we expect to be known and to highlight similarities and differences across the different methods presented in class. Exam material encompass: - The lecture notes (selected chapters/sections highlighted on the website) - Slides - Solutions to the exercises - Solutions to the practicals. 7

MACHINE LEARNING Basic Concepts in ML Formalism: • Be capable of giving formal definitions of a pdf, marginal, likelihood • Be capable of giving principle of each ML algorithm seen in class Taxonomy: • Know the difference between supervised / unsupervised learning, reinforcement • Be able to discuss concepts such as generative vs. discriminative methods Principles of evaluation: • Know the basic principles of evaluation of ML techniques: • training vs. testing sets, • crossvalidation, • ground truth. 8

MACHINE LEARNING Basic Concepts in ML To assess the validity of a Machine Learning algorithm, one measures its performance against the training, validation and testing sets. These sets are built from partitioning the data set at hand. Crossvalidation Training Set Validation Testing Set Set Crossvalidation N-fold crossvalidation: Typical choice is 10-fold crossvalidation 9

MACHINE LEARNING Basic Concepts in ML Mathematical notions of probability distribution function, cumulative distribution function, marginal, maximum likelihood, MAP, etc.       PDF: ( ) 0, and ( ) 1 p x x p x dx      CDF: ( ) D x p x dx     Marginal prob. of x given joint distribution: ( , ) p x p x y dy x        Likelihood function: | , , | L x y p x y   ˆ    Maximum likelihood: arg max L | x     p x  ˆ Maximum a posterior i: arg max | x x 10

MACHINE LEARNING Basic Concepts in ML       2 2 Length of the ellipse's axes are equal , | , , , p X Y X Y     0 : 2-dim vector     1 T to and , with . V   V  1 2    0  : 2 2 covariance matrix 2 Each contour line corresponds to a multiple of the standard deviation 1 st eigenvector along the eigenvectors. 2nd eigenvector   , p X Y Y Y X X 11 11

MACHINE LEARNING Basic Concepts in ML The conditional and marginal of a multi-dimensional Gaussian distribution are also Gaussians. 12

MACHINE LEARNING Basic Concepts in ML Kernel Methods : Determine a metric which brings out features of the data so as to make subsequent computation easier Original Space After Lifting Data in Feature Space x 2 x 1 Data becomes linearly separable when using a rbf kernel and projecting onto first 2 PC of kernelPCA. 13 13

MACHINE LEARNING Basic Concepts in ML Kernel Methods in Machine Learning - allow to model non-linear relationship across data - exploit the Kernel Trick: Is based on the observation that the associated linear method is based on computing an inner product across variables. This inner product can be replaced by the kernel function if known. The problem becomes then linear in feature space. Metric of similarity across   : k X X datapoints          i j i j , , . k x x x x 14

MACHINE LEARNING Preparation for the Exam For each algorithm, be able to explain: – what it can do: classification, regression, structure discovery / reduction of dimensionality – what one should be careful about (limitations of the algorithm, choice of hyperparameters) and how does this choice influences the results. – the key steps of the algorithm, its hyperparameters, the variables it takes as input and the variables it outputs 15

MACHINE LEARNING Preparation for the Exam: example In red: what you should know; in blue, what would be good to know / bonus. • For each algorithm, be able to explain: SVM – what it can do: classification, regression, structure discovery / reduction of dimensionality Performs binary classification; can be extended to multi-class classification; can be extended to regression (SVR) – what one should be careful about (limitations of the algorithm, choice of hyperparameters) e.g. choice of kernel; too small kernel width in Gaussian kernels may lead to overfitting; one can proceed to iterative estimation of the kernel parameters – the key steps of the algorithm, its hyperparameters, the variables it takes as input and the variables it outputs 16

MACHINE LEARNING Class Overview This class has presented groups of methods for doing classification, regression, structure discovery, estimation of time series. Note that several algorithms do more than one of these types of computation. Classification / Clustering Structure Discovery Regression Kernel K-means, GMM Decision Trees Linear / Kernel SVR + boosting/bagging GMR SVM PCA, CCA GPR Time Series RL 17

MACHINE LEARNING Topics Requested • Comparison between PCA, CCA and kernel PCA (ICA; not covered in class); which to use when? • SVM, Boosting (Neural Networks: not covered in class): pros and cons • GMR and probabilistic regression 18

MACHINE LEARNING PCA    A Linear mapping X Y AX    N A q : , q A N Reduction of the dimensionality 1 st axis aligned with maximal variance and determines correlation across dimensions of variables! 2 nd , 3 rd axes orthogonal!  All projections are uncorrelated! Raw 2D dataset Projected onto two first principal components 19

MACHINE LEARNING PCA Pros : - Easy to implement (batch and incremental versions possible) - Gives easy to interpret projections of the data - Extract main correlations across the data - Optimal reduction of dimensionality (loose minimum information; minimum error at reconstruction) Cons: Extracts only linear correlations  kernel PCA - Very sensitive to noise (outliers)  probabilistic PCA - Cannot deal with incomplete data  probabilistic PCA - Forces the projection to be orthogonal and decorrelate data  ICA - (requires statistical independence) – ICA NOT COVERED IN CLASS! PCA remains a very powerful method; Worth trying it out on your data before using any other method! 20

MACHINE LEARNING Kernel PCA kPCA differs from PCA. The eigenvectors are M-dimensional (size of number of datapoints) Projecting onto the eigenvectors after kPCA finds structure in the data Circles and elliptic contour lines with RBF kernels 21

MACHINE LEARNING Kernel PCA kPCA differs from PCA. The eigenvectors are M-dimensional (size of number of datapoints) Projecting onto the eigenvectors after kPCA finds structure in the data Hyperbolas and intersecting lines when using a polynomial kernel 22

MACHINE LEARNING CCA x  y  N P   1 1 , x y   T T max , w w corr w x w y x y x y ,   2 2 , x y Video description Audio description Extract hidden structure that maximizes correlation across two different projections. 23

MACHINE LEARNING PCA versus CCA (see solutions exercise 2) 24

MACHINE LEARNING Topics Requested • Comparison between PCA, CCA and kernel PCA (ICA; not covered in class); which to use when? • SVM, Boosting (Neural Networks: not covered in class): pros and cons • GMR and probabilistic regression 25

MACHINE LEARNING SVM Find separating plane with maximal margin Class with label y=-1 Class with label y=+1 x1 x2 Constrained based optimization 1 2 Convex problem  min w 2 , w b global optimum but not        , 1 when 1 w x b y i i   unique solution!    , 1, y w x b i i i=1,2,....,M.       , 1 when 1 w x b y i i  26

MACHINE LEARNING Overview 1 MACHINE LEARNING Oral Presentations of - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING Overview 1 MACHINE LEARNING Oral Presentations of Projects Start at 9h15 am and last until 12h30 2 MACHINE LEARNING Exam Format The exam lasts a total of 40 minutes: -Upon entering the room, you pick at

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Today N I V E U R S E I H T T Y O H F G R E U D I B N Grammar and Music

Inclusive J/ Longitudinal Double Spin Asymmetry Measurements at Forward Rapidity in p+p

GlobalISel LLVMs Latest Instruction Selection Framework Diana Picu Instruction Selection

Multivariate Interpolation of Wind Field Based on GPR Mia Feng January 24, 2018 Incompatible

Gaussian Process based Radio Map Recovery HuangZili Content 1.Research Background

Notes On the role of predicates and constraints Mode and code iterators Defining

POWER and ARM p. 1 IBM POWER: high-end server processor POWER 8: up to 192 cores, each with

CSEE 3827: Fundamentals of Computer Systems Lecture 1 January 21, 2009 Martha Kim