Linear Models Machine Learning 1 Checkpoint: The bigger picture - PowerPoint PPT Presentation

Linear Models Machine Learning 1

Checkpoint: The bigger picture • Supervised learning: instances, concepts, and hypotheses • Specific learners Learning Hypothesis/ Labeled algorithm Model h – Decision trees data New example Prediction h • General ML ideas – Features as high dimensional vectors – Overfitting Questions? 2

Lecture outline • Linear models • What functions do linear classifiers express? 6

Where are we? • Linear models – Introduction: Why linear classifiers and regressors? – Geometry of linear classifiers – A notational simplification – Learning linear classifiers: The lay of the land • What functions do linear classifiers express? 7

Is learning possible at all? • There are 2 16 = 65536 possible Boolean functions over 4 inputs – Why? There are 16 possible outputs. Each way to fill these 16 slots is a different function, giving 2 16 functions. • We have seen only 7 outputs • We cannot know what the rest are without seeing them – Think of an adversary filling in the labels every time you make a guess at the function 8

Is learning possible at all? • There are 2 16 = 65536 possible Boolean functions over 4 inputs – Why? There are 16 possible outputs. Each way to fill these 16 slots is a different function, giving 2 16 functions. How could we possibly learn anything? • We have seen only 7 outputs • We cannot know what the rest are without seeing them – Think of an adversary filling in the labels every time you make a guess at the function 9

Solution: Restrict the search space A hypothesis space is the set of possible functions we consider • – We were looking at the space of all Boolean functions – Instead choose a hypothesis space that is smaller than the space of all functions • Only simple conjunctions (with four variables, there are only 16 conjunctions without negations) • Simple disjunctions m-of-n rules: Fix a set of n variables. At least m of them must be true • • Linear functions ! 10

Which is the better classifier? Suppose this our training set and we have to separate the blue circles from the red triangles 11

Which is the better classifier? Curve: A Suppose this our training set and we have to separate the blue circles from the red triangles 12

Which is the better classifier? Curve: A Line: B Suppose this our training set and we have to separate the blue circles from the red triangles 13

Which is the better classifier? Curve: A Line: B Suppose this our training set and we have to separate the blue circles from the red triangles Blue Red 14

Which is the better classifier? Curve: A Line: B Suppose this our training set and we have to separate the blue circles from the red triangles Blue Red Think about overfitting Which curve runs the risk of overfitting? Simplicity versus Accuracy 15

Similar argument for regression F( x ) x Linear regression might make smaller errors on new points 16

Similar argument for regression F( x ) Curve: A x Linear regression might make smaller errors on new points 17

Similar argument for regression F( x ) Curve: A Line: B x Linear regression might make smaller errors on new points 18

Recall: Regression vs. Classification • Linear regression is about predicting real valued outputs • Linear classification is about predicting a discrete class label – +1 or -1 – SPAM or NOT-SPAM – Or more than two categories 19

Linear classifiers: An example Suppose we want to determine whether a robot arm is defective or not using two measurements: 1. The maximum distance the arm can reach 𝑒 2. The maximum angle it can rotate 𝑏 Suppose we use a linear decision rule that predicts defective if 2𝑒 + 0.01𝑏 ≥ 7 We can apply this rule if we have the two measurements For example: for a certain arm, if d = 3 and a = 200, then 2𝑒 + 0.01𝑏 = 8 ≥ 7 The arm would be labeled as not defective 20

Linear classifiers: An example Suppose we want to determine whether a robot arm is defective or not using two measurements: 1. The maximum distance the arm can reach 𝑒 2. The maximum angle it can rotate 𝑏 Suppose we use a linear decision rule that predicts defective if 2𝑒 + 0.01𝑏 ≥ 7 We can apply this rule if we have the two measurements This rule is an example of a linear classifier For example: for a certain arm, if d = 3 and a = 200, then Features are weighted and added up, 2𝑒 + 0.01𝑏 = 8 ≥ 7 the sum is checked against a threshold The arm would be labeled as not defective 23

Linear Classifiers Inputs are 𝑒 dimensional feature vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign 𝐱 ! 𝐲 + 𝑐 = sign = 𝑥 " 𝑦 " + 𝑐 " 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 24

Linear Classifiers Inputs are 𝑒 dimensional feature vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign 𝐱 ! 𝐲 + 𝑐 = sign = 𝑥 " 𝑦 " + 𝑐 " 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 25

Linear Classifiers Inputs are 𝑒 dimensional feature vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign 𝐱 ! 𝐲 + 𝑐 = sign = 𝑥 " 𝑦 " + 𝑐 " if 𝐱 ! 𝐲 + 𝑐 ≥ 0 then predict 𝑧 = +1 if 𝐱 ! 𝐲 + 𝑐 < 0 then predict 𝑧 = −1 𝑐 is called the bias term 26

The geometry of a linear classifier An illustration in two dimensions x 1 x 2 27

The geometry of a linear classifier An illustration in two dimensions +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 28

The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 29

The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 30

The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 31

The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - - - - -- - - - - x 2 32

The geometry of a linear classifier An illustration in two dimensions sgn(b +w 1 x 1 + w 2 x 2 ) We only care about the sign, not the magnitude b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - - - - -- - - - - x 2 33

Linear Models Machine Learning 1 Checkpoint: The bigger picture - PowerPoint PPT Presentation

Linear Models Machine Learning 1 Checkpoint: The bigger picture Supervised learning: instances, concepts, and hypotheses Specific learners Learning Hypothesis/ Labeled algorithm Model h Decision trees data New example

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

15-388/688 - Practical Data Science: Intro to Machine Learning & Linear Regression J. Zico

Week 3: Linear Regression Instructor: Sergey Levine 1 The regression problem We saw how we can

GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof.

Basis of Neural Networks School of Data Science, Fudan

Machine learning theory Regression Hamid Beigy Sharif university of technology June 1, 2020

Linear Models Machine Learning 1 Checkpoint: The bigger picture - PowerPoint PPT Presentation

Linear Models Machine Learning 1 Checkpoint: The bigger picture Supervised learning: instances, concepts, and hypotheses Specific learners Learning Hypothesis/ Labeled algorithm Model h Decision trees data New example

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

15-388/688 - Practical Data Science: Intro to Machine Learning &amp; Linear Regression J. Zico

Week 3: Linear Regression Instructor: Sergey Levine 1 The regression problem We saw how we can

GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof.

Basis of Neural Networks School of Data Science, Fudan

Machine learning theory Regression Hamid Beigy Sharif university of technology June 1, 2020

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

15-388/688 - Practical Data Science: Intro to Machine Learning & Linear Regression J. Zico