MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING MACHINE LEARNING Overview 1 1

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Exam Format The exam lasts a total of 3 hours: - Upon entering the room, you must leave you bag, cell phone, etc, in a corner of the room; you are allowed to keep a couple of pen/pencil/ eraser and a few blank sheets of paper. - The exam will be graded anonymously; make sure to have your camipro card with you to write your sciper number on your exam sheet, as we will check your card. Exam is closed book but you can bring one A4 page with personal handwritten notes written recto-verso. 2 2

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING What to know for the exam Formalism / Taxonomy: • You should be capable of giving formal definitions of a pdf, marginal, likelihood. • You should know the difference between supervised / unsupervised learning and be able to give examples of algorithms in each case. Principles of evaluation: • You should know the basic principles of evaluation of ML techniques: training vs. testing sets, cross-validation, ground truth. • You should know the principle of each method of evaluation seen in class and know which method of evaluation to apply where (F-measure in clustering vs. classification, BIC, etc). 3 3

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING What to know for the exam • For each algorithm, be able to explain: – what it can do: classification, regression, structure discovery / reduction of dimensionality – what one should be careful about (limitations of the algorithm, choice of hyperparameters) and how does this choice influence the results. – the key steps of the algorithm, its hyperparameters, the variables it takes as input and the variables it outputs 4 4

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING What to know for the exam • For each algorithm, be able to explain: SVM – what it can do: classification, regression, structure discovery / reduction of dimensionality Performs binary classification; can be extended to multi-class classification; can be extended to regression (SVR) – what one should be careful about (limitations of the algorithm, choice of hyperparameters) e.g. choice of kernel; too small kernel width in Gaussian kernels may lead to over-fitting; – the key steps of the algorithm, its hyperparameters, the variables it takes as input and the variables it outputs 5 5

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Class Overview This overview is meant to highlight similarities and differences across the different methods presented in class. To be well prepared to the exam, read carefully the slides, the exercises and their solutions. 6 6

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Class Overview This class has presented groups of methods for structure discovery, classification and non-linear regression . Structure Discovery Classification PCA SVM, GMM + Bayes & Clustering Techniques K-Means, Regression Soft K-means, GMM SVR GMR 7 7

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Overview: Finding Structure in Data Techniques for finding structure in data proceed by projecting or grouping the data from the original space into another space of lower dimension . The projected space is chosen so as to highlight particular features common to subsets of datapoints. Pre-processing step: The found structure may be exploited in a second stage by another algorithm for regression, classification, etc. 8 8

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Overview: Finding Structure in Data Principal Component Analysis (PCA) x  N  Y AX   q y , q N - Determines what is most common across datapoints. - Projects onto axes that maximize correlation (eigenvectors of covariance matrix) -  lower dimensions allow to discriminate across subgroups of datapoints! - Discard dimensions with the smallest eigenvalues. 9 9

APPLIED MACHINE LEARNING Overview: Finding Structure in Data Clustering Methods All three methods for clustering we have seen in class (K-means, soft K- means, GMM) are all solved through E-M (expectation-maximization). You should be able to spell out the similarities and differences across K- means, soft K-means and GMM. - They are similar in their representation of the problem and optimization method. - They differ in the number of parameters to estimate and number of hyper-parameters, etc. 10

APPLIED MACHINE LEARNING Overview: Finding Structure in Data Clustering Methods and Metric of Similarity All clustering methods depend on choosing well a metric of similarity to measure how similar subgroup of data-points are. You should be able to list which metric of similarity can be used in each case and how this choice may impact the clustering. Exponential decreasing function in Likelihood of each Gauss function Lp-norm in K-means soft K-means modulated by the stiffness Can use isotropic/diagonal & full ~= isotropic rbf (unnormalized Gauss) function covariance matrices 11

APPLIED MACHINE LEARNING Clustering versus Classification Fundamental difference between clustering and classification: • Clustering is unsupervised classification • Classification is supervised classification Both use the F-measure but not in the same way. The clustering F-measure assumes a semi-supervised model, in which only a subset of the points are labelled. 12

APPLIED MACHINE LEARNING Semi-Supervised Learning Clustering F1-Measure: (careful: similar but not the same F-measure as the F-measure we will see for classification!) Tradeoff between clustering correctly all datapoints of the same class in the same cluster and making sure that each cluster contains points of only one class. M : nm of labeled datapoints Penalize fraction of labeled    C c : the set of classes points in each class i K : nm of clusters, Picks for each class n : nm of members of class c and of cluster k ik i the cluster with the c         i F C K , max F c k , maximal F1 measure 1 1 i M  c C k i     2 R c k P c k , ,    i i F c k ,     Recall : proportion of  1 i R c k , P c k , i i datapoints correctly n    ik classified/clusterized R c k , i c i Precision : proportion of n    ik P c k , i datapoints of the same k class in the cluster 13 13

APPLIED MACHINE LEARNING Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same class and making sure that each class contains points of only one class. True Positives( TP ) : nm of datapoints of class 1 that are correctly classified False Negative ( FN ) : nm of datapoints of class 1 that are incorrectly classified False Positives( FP ) : nm of datapoints of class 2 that are incorrectly classified Recall: Proportion of datapoints TP Recall:  correctly classified in Class 1 TP FN TP Precision: Precision : proportion of datapoints of  TP FP class 1 correctly classified over all datapoints classified in class 1 2*Precision*Recall  F Precision+Recall 14

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Overview: Classification GMM + Bayes SVM Original two-classes 1 Gauss fct per class 7 support vectors But full covariance matrix Non-Linear boundary in both cases. Compute number of parameters required for the same fit. 15 15

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Kernel Methods We have seen two examples of kernel method with SVM/SVR. Kernel Methods implicitly search for structure in the data prior to performing another computation (classification or regression) - The kernel allows to extract non-linear types of correlations. - These methods exploit the Kernel Trick: The kernel trick exploits the observation that all linear methods for finding structure in data are based on computing an inner product across variables. This inner product can be replaced by the kernel function if known. The problem becomes then linear in feature space.   k X : X Metric of similarity across       datapoints    i j i j k x x , x , x . 16 16

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Overview: Regression Techniques SVR and GMR lead to a regressive model that computes a weighted combination of local predictors. For a query point , predict the associated output : x y SVR Solution GMR Solution     M  K              m * i i y k x x , b y x x i i i   i 1 i 1 In SVR, the computation is reduced to summing only over the support vectors (a subset of datapoints) In GMR, the sum is over the set of Gaussians. The centers of the Gaussians are usually not located on any particular datapoint. The models are local m (x)! 17 17

APPLIED MACHINE LEARNING – 2011-2012 APPLIED MACHINE LEARNING Overview: Regression Techniques SVR and GMR lead to the following regressive model: 8 Gauss functions full covariance matrix GMR Solution K         m i y x x i  i 1 18 18

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING Exam Format The exam lasts a total of 3 hours: - Upon entering the room, you must

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 3 Instructor: Yizhou Sun

Classification and Prediction 3 Cengiz Gunay Partial slide credits: Li Xiong, Han, Kamber, and

Review of classification methods for fraud detection Charlotte Werger Data Scientist DataCamp

A Semi-supervised Stacked Autoencoder Approach for Network Traffic Classification Ons Aouedi,

Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade 2018 c

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

Off-Policy Evaluation via Off- Policy Classification Alex Irpan, Kanishka Rao, Konstantinos

HTTPS Traffic Classification Wazen M. Shbair, Thibault Cholez, J er ome Fran cois,