Linear Classifiers: Expressiveness Machine Learning 1 Lecture - PowerPoint PPT Presentation

Linear Classifiers: Expressiveness Machine Learning 1

Lecture outline • Linear models: Introduction • What functions do linear classifiers express? 2

Where are we? • Linear models: Introduction • What functions do linear classifiers express? – Conjunctions and disjunctions – m-of-n functions – Not all functions are linearly separable – Feature space transformations – Exercises 3

Which Boolean functions can linear classifiers represent? • Linear classifiers are an expressive hypothesis class • Many Boolean functions are linearly separable – Not all though – Recall: In comparison, decision trees can represent any Boolean function 4

Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” x 1 x 2 x 3 y 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 5

Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign 0 0 0 0 -3 0 0 0 1 0 -2 0 0 1 0 0 -2 0 0 1 1 0 -1 0 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 1 1 1 1 0 1 6

Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” Negations are okay too. x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 0 0 0 0 -3 0 0 0 1 0 -2 0 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ ¬𝑦 # corresponds to 0 1 0 0 -2 0 0 1 1 0 -1 0 𝑦 1 + 𝑦 2 + 1 − 𝑦 3 ≥ 3 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 1 1 1 1 0 1 7

Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” Negations are okay too. x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 0 0 0 0 -3 0 0 0 1 0 -2 0 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ ¬𝑦 # corresponds to 0 1 0 0 -2 0 0 1 1 0 -1 0 𝑦 1 + 𝑦 2 + 1 − 𝑦 3 ≥ 3 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 Exercise : What would the linear 1 1 1 1 0 1 threshold function be if the conjunctions here were replaced with disjunctions? 8

Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” Negations are okay too. x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 0 0 0 0 -3 0 0 0 1 0 -2 0 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ ¬𝑦 # corresponds to 0 1 0 0 -2 0 0 1 1 0 -1 0 𝑦 1 + 𝑦 2 + 1 − 𝑦 3 ≥ 3 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 Exercise : What would the linear 1 1 1 1 0 1 threshold function be if the conjunctions here were replaced Questions? with disjunctions? 9

m-of-n functions m-of-n rules • There is a fixed set of n variables • y = true if, and only if, at least m of them are true • All other variables are ignored Suppose there are five Boolean variables: x 1 , x 2 , x 3, x 4 , x 5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x 1 , x 2 , x 3 }”? 10

m-of-n functions m-of-n rules • There is a fixed set of n variables • y = true if, and only if, at least m of them are true • All other variables are ignored Suppose there are five Boolean variables: x 1 , x 2 , x 3, x 4 , x 5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x 1 , x 2 , x 3 }”? 𝑦 1 + 𝑦 2 + 𝑦 + ≥ 2 11

m-of-n functions m-of-n rules • There is a fixed set of n variables • y = true if, and only if, at least m of them are true • All other variables are ignored Suppose there are five Boolean variables: x 1 , x 2 , x 3, x 4 , x 5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x 1 , x 2 , x 3 }”? 𝑦 1 + 𝑦 2 + 𝑦 + ≥ 2 Questions? 12

Not all functions are linearly separable Parity is not linearly separable (The XOR function) Can’t draw a line to separate the two classes - - - - - - - - +++ + - - + - - - + - - + + - - x 1 - - - - - - - - - - +++ - - + - + - - - + + + - x 2 Questions? 13

Not all functions are linearly separable • XOR is not linear – 𝑧 = 𝑦 XOR 𝑧 = 𝑦 ∧ ¬𝑧 ∨ (¬𝑦 ∧ 𝑧) – Parity cannot be represented as a linear classifier • f( x ) = 1 if the number of 1’s is even • Many non-trivial Boolean functions – Example: 𝑧 = 𝑦 , ∧ 𝑦 - ∨ 𝑦 + ∧ ¬𝑦 . – The function is not linear in the four variables 14

Even these functions can be made linear These points are not separable in 1-dimension by a line What is a one-dimensional line, by the way? x The trick: Change the representation 15

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 16

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 2 x 17

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 2 (-2, 4) x -2 18

The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 2 x Now the data is linearly separable in this space! 19

Exercise How would you use the feature transformation idea to make XOR in two dimensions linearly separable in a new space? To answer this question, you need to think about a function that maps examples from two dimensional space to a higher dimensional space. 20

Almost linearly separable data sgn(b +w 1 x 1 + w 2 x 2 ) Training data is almost separable, except for some noise How much noise do +++ we allow for? + - + + + x 1 + - - - - - - - - - - - - - - - - - x 2 21

Almost linearly separable data sgn(b +w 1 x 1 + w 2 x 2 ) Training data is almost b +w 1 x 1 + w 2 x 2 =0 separable, except for some noise How much noise do +++ we allow for? + - + + + x 1 + - - - - - - - - - - - - - - - - - x 2 22

Linear classifiers: An expressive hypothesis class • Many functions are linear • Often a good guess for a hypothesis space • Some functions are not linear – The XOR function – Non-trivial Boolean functions • But there are ways of making them linear in a higher dimensional feature space 23

Why is the bias term needed? b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 24

Why is the bias term needed? If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough +++ x 1 + + - - + + + - - - - - - - - - - -- - - - - x 2 25

Why is the bias term needed? If b is zero, then we are restricting the learner only to hyperplanes that go w 1 x 1 + w 2 x 2 =0 through the origin May not be expressive enough +++ x 1 + + - - + + + - - - - - - - - - - -- - - - - x 2 26

Exercises 1. Represent the simple disjunction as a linear classifier. 2. How would you apply the feature space expansion idea for the XOR function? 27

Linear Classifiers: Expressiveness Machine Learning 1 Lecture - PowerPoint PPT Presentation

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models: Introduction What functions do linear classifiers express? 2 Where are we? Linear models: Introduction What functions do linear classifiers

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Lecture 6: Cook Levin Theorem Arijit Bishnu 11.03.2010 Warm Up Expressiveness of Boolean

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

Overview Word Systems Regular expressiveness Linear temporal logic B uchi-automata

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

5. Structured Descriptions & Tradeoff Between Expressiveness and Tractability Outline

adaptability of RelaxAJ and expressiveness of StrongAspectJ Tomoyuki Aotani Manabu Touyama and

Stphane Bura Emotional AI for Expanding Worlds Stphane Bura + EXPRESSIVENESS SCALABILITY

A Type-coherent, Expressive Representation as an Initial Step to Language Understanding Gene

Middlebox Technologies with Intel SGX A Literature Survey Shiv Kushwah & Sumukh Shivakumar 1

Parametric completeness for separation theories (via hybrid logic) James Brotherston University

(LMCS, p. 317) V.1 FirstOrder Logic This is the most powerful, most expressive logic that we

Gaussian Process Behaviour in Wide Deep Neural Networks Alexander G. de G. Matthews DeepMind

Symmetric categorial grammar Michael Moortgat British Logic Colloquium, Nottingham, Sept 2008

Metamath Zero or: How to verify a verifier Mario Carneiro Carnegie Mellon University January 9,

Its About Time: An Introduction to Timely Dataflow Data Council, October 19 clockworks

Linear Classifiers: Expressiveness Machine Learning 1 Lecture - PowerPoint PPT Presentation

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models: Introduction What functions do linear classifiers express? 2 Where are we? Linear models: Introduction What functions do linear classifiers

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Lecture 6: Cook Levin Theorem Arijit Bishnu 11.03.2010 Warm Up Expressiveness of Boolean

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

Overview Word Systems Regular expressiveness Linear temporal logic B uchi-automata

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

First look at structures CS 6355: Structured Prediction 1 So far Binary classifiers

5. Structured Descriptions &amp; Tradeoff Between Expressiveness and Tractability Outline

adaptability of RelaxAJ and expressiveness of StrongAspectJ Tomoyuki Aotani Manabu Touyama and

Stphane Bura Emotional AI for Expanding Worlds Stphane Bura + EXPRESSIVENESS SCALABILITY

A Type-coherent, Expressive Representation as an Initial Step to Language Understanding Gene

Middlebox Technologies with Intel SGX A Literature Survey Shiv Kushwah &amp; Sumukh Shivakumar 1

Parametric completeness for separation theories (via hybrid logic) James Brotherston University

(LMCS, p. 317) V.1 FirstOrder Logic This is the most powerful, most expressive logic that we

Gaussian Process Behaviour in Wide Deep Neural Networks Alexander G. de G. Matthews DeepMind

Symmetric categorial grammar Michael Moortgat British Logic Colloquium, Nottingham, Sept 2008

Metamath Zero or: How to verify a verifier Mario Carneiro Carnegie Mellon University January 9,

Its About Time: An Introduction to Timely Dataflow Data Council, October 19 clockworks

5. Structured Descriptions & Tradeoff Between Expressiveness and Tractability Outline

Middlebox Technologies with Intel SGX A Literature Survey Shiv Kushwah & Sumukh Shivakumar 1