THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM - PowerPoint PPT Presentation

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020 1

WHY ML ? WHY ML ? Traditional engineering is top-down We use fundamental principles (mathematics, physics) to build models and abstractions Design is performed based on models Example : building a communication system Machine learning is bottom-up We think there is a model to be found The model is too complex to describe or identify from fundamental principles We have data Adversarial examples [Elsayed et al.’18’] Example : classifying cats and dogs There are plenty of problems that do not require ML We should probably not try to learn laws of physics with ML There are plenty of situation in which ML can help Engineering design based on heuristics Example : Computer Aided Design 2

ML IN THE REAL WORLD ML IN THE REAL WORLD Match making Movie recommendations Autonomous vehicles Credit card fraud detection Handwriting recognition Cooking Painting Teaching 3

TYPES OF LEARNING TYPES OF LEARNING Supervised learning Given input data representing observation of phenomenon { x i } N i =1 Given output data representing “label” attached to observation { y i } N i =1 Goal is to identify input-output relationship from training data and generalize x i y i } N {( , ) i =1 Unsupervised learning Given input data representing observation of phenomenon { x i } N i =1 No output data! Goal is to understand structure in data, or infer some characteristic of underlying probability distribution Other types of learning semi-supervised learning active learning online learning reinforcement learning transfer learning imitation learning 4

COMPONENTS OF SUPERVISED MACHINE LEARNING COMPONENTS OF SUPERVISED MACHINE LEARNING 1. An unknown function to learn f : X → Y : x ↦ y = f ( x ) The formula to distinguish cats from dogs 2. A dataset D ≜ {( x 1 y 1 , ), ⋯ , ( x N y N , )} : picture of cat/dog R d x i ∈ X ≜ : the corresponding label cat/dog y i ∈ Y ≜ R 3. A set of hypotheses as to what the function could be H Example: deep neural nets with AlexNet architecture 4. An algorithm to find the best that explains ALG h ∈ H f Terminology: : regression problem Y = R : binary classification problem Learning model #1 | Y | = 2 The goal is to generalize , i.e., be able to classify inputs we have not seen. 5

A LEARNING PUZZLE A LEARNING PUZZLE Learning seems impossible without additional assumptions! 6

POSSIBLE VS PROBABLE POSSIBLE VS PROBABLE Flip a biased coin that lands on head with unknown probability p ∈ [0, 1] and P (head) = p P (tail) = 1 − p Say we flip the coin times, can we estimate ? N p https://xkcd.com/221/ # head p = ^ N Can we relate to ? p p ^ The law of large numbers tells us that converges in probability to as gets large p ^ p N ∀ ϵ > 0 P (| p − p | > ϵ ) ⟶ 0. ^ N →∞ It is possible that is completely off but it is not probable p ^ 7

COMPONENTS OF SUPERVISED MACHINE LEARNING COMPONENTS OF SUPERVISED MACHINE LEARNING 1. An unknown function to learn f : X → Y : x ↦ y = f ( x ) 2. A dataset D ≜ {( x 1 y 1 , ), ⋯ , ( x N y N , )} drawn i.i.d. from unknown distribution on { x i } N P x X i =1 are the corresponding targets { y i } N y i ∈ Y ≜ R i =1 3. A set of hypotheses as to what the function could be H 4. An algorithm to find the best that explains ALG h ∈ H f Learning model #2 8

ANOTHER LEARNING PUZZLE ANOTHER LEARNING PUZZLE Which color is the dress? 9

COMPONENTS OF SUPERVISED MACHINE LEARNING COMPONENTS OF SUPERVISED MACHINE LEARNING 1. An unknown conditional distribution to learn P y | x models with noise P y | x f : X → Y 2. A dataset D ≜ {( x 1 y 1 , ), ⋯ , ( x N y N , )} drawn i.i.d. from an unknown probability { x i } N i =1 distribution on P x X are the corresponding targets { y i } N y i ∈ Y ≜ R i =1 3. A set of hypotheses as to what the function could be H 4. An algorithm to find the best that explains ALG h ∈ H f The roles of and are different P y | x P x is what we want to learn, captures the underlying P y | x function and the noise added to it Learning model #3 models sampling of dataset, need not be learned P x 10

YET ANOTHER LEARNING PUZZLE YET ANOTHER LEARNING PUZZLE Assume that you are designing a fingerprint authentication system You trained your system with a fancy machine learning system The probability of wrongly authenticating is 1% The probability of correctly authenticating is 60% Is this a good system? It depends! Biometric authentication system If you are GTRI, this might be ok (security matters more) If you are Apple, this is not acceptable (user convenience matters too) There is an application dependentent cost that can affect the design 11

COMPONENTS OF SUPERVISED MACHINE LEARNING COMPONENTS OF SUPERVISED MACHINE LEARNING 1. A dataset D ≜ {( x 1 y 1 , ), ⋯ , ( x N y N , )} drawn i.i.d. from an unknown probability { x i } N i =1 distribution on P x X are the corresponding targets { y i } N y i ∈ Y ≜ R i =1 2. An unknown conditional distribution P y | x models with noise P y | x f : X → Y 3. A set of hypotheses as to what the function could be H 4. A loss function capturing the “cost” of ℓ : Y × Y → R + prediction 5. An algorithm to find the best that explains ALG h ∈ H f Learning model 12

THE LEARNING PROBLEM THE LEARNING PROBLEM Learning is not memorizing Our goal is not to find that accurately assigns values to elements of h ∈ H D Our goal is to find the best that accurately predicts values of unseen samples h ∈ H Consider hypothesis . We can easily compute the empirical risk (a.k.a. in-sample error) h ∈ H N 1 ˆ N R ( h ) ≜ N ∑ ℓ( y i , h ( x i )) i =1 What we really care about is the true risk (a.k.a. out-sample error) R ( h ) ≜ E x y [ℓ( y , h ( x ))] Question #1: Can generalize ? For a given , is close to ? ˆ N h R ( h ) R ( h ) Question #2: Can we learn well ? Given , the best hypothesis is h ♯ H ≜ argmin h ∈ H R ( h ) Our algorithm can only find h ∗ ˆ N ≜ argmin h ∈ H R ( h ) Is close to ? ˆ N h ∗ h ♯ R ( ) R ( ) Is ? h ♯ R ( ) ≈ 0    13

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM - PowerPoint PPT Presentation

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020 1 WHY ML ? WHY ML ? Traditional engineering is top-down We use fundamental principles (mathematics, physics) to build models and abstractions

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

1 Privacy: Video Whose Information Is It? What is privacy? Examine a transaction of

5 Selling Products From Code to Product gidgreen.com/course Lecture 5 Introduction

LUC HENDRIKS RADBOUD UNIVERSITY, NIJMEGEN (NL) VARIATIONAL

Enrollment Introduction Start simple Avoid complex enrollment scenarios until after we

Basic Concepts in Big Data ChengXiang (Cheng) Zhai

Terminology - NPPI & PII Defined Non-public Personal Information (NPPI): Personally

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra

Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM - PowerPoint PPT Presentation

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020 1 WHY ML ? WHY ML ? Traditional engineering is top-down We use fundamental principles (mathematics, physics) to build models and abstractions

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

1 Privacy: Video Whose Information Is It? What is privacy? Examine a transaction of

5 Selling Products From Code to Product gidgreen.com/course Lecture 5 Introduction

LUC HENDRIKS RADBOUD UNIVERSITY, NIJMEGEN (NL) VARIATIONAL

Enrollment Introduction Start simple Avoid complex enrollment scenarios until after we

Basic Concepts in Big Data ChengXiang (Cheng) Zhai

Terminology - NPPI &amp; PII Defined Non-public Personal Information (NPPI): Personally

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra

Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan

Terminology - NPPI & PII Defined Non-public Personal Information (NPPI): Personally