linear classifiers expressiveness
play

Linear Classifiers: Expressiveness Machine Learning 1 Lecture - PowerPoint PPT Presentation

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models: Introduction What functions do linear classifiers express? 2 Where are we? Linear models: Introduction What functions do linear classifiers


  1. Linear Classifiers: Expressiveness Machine Learning 1

  2. Lecture outline • Linear models: Introduction • What functions do linear classifiers express? 2

  3. Where are we? • Linear models: Introduction • What functions do linear classifiers express? – Conjunctions and disjunctions – m-of-n functions – Not all functions are linearly separable – Feature space transformations – Exercises 3

  4. Which Boolean functions can linear classifiers represent? • Linear classifiers are an expressive hypothesis class • Many Boolean functions are linearly separable – Not all though – Recall: In comparison, decision trees can represent any Boolean function 4

  5. Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” x 1 x 2 x 3 y 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 5

  6. Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign 0 0 0 0 -3 0 0 0 1 0 -2 0 0 1 0 0 -2 0 0 1 1 0 -1 0 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 1 1 1 1 0 1 6

  7. Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” Negations are okay too. x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 0 0 0 0 -3 0 0 0 1 0 -2 0 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ ¬𝑦 # corresponds to 0 1 0 0 -2 0 0 1 1 0 -1 0 𝑦 1 + 𝑦 2 + 1 − 𝑦 3 ≥ 3 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 1 1 1 1 0 1 7

  8. Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” Negations are okay too. x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 0 0 0 0 -3 0 0 0 1 0 -2 0 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ ¬𝑦 # corresponds to 0 1 0 0 -2 0 0 1 1 0 -1 0 𝑦 1 + 𝑦 2 + 1 − 𝑦 3 ≥ 3 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 Exercise : What would the linear 1 1 1 1 0 1 threshold function be if the conjunctions here were replaced with disjunctions? 8

  9. Conjunctions and disjunctions 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ 𝑦 # is equivalent to “ 𝑧 = 1 whenever 𝑦 1 + 𝑦 2 + 𝑦 3 ≥ 3 ” Negations are okay too. x 1 x 2 x 3 y x 1 + x 2 + x 3 – 3 sign In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 0 0 0 0 -3 0 0 0 1 0 -2 0 𝑧 = 𝑦 ! ∧ 𝑦 " ∧ ¬𝑦 # corresponds to 0 1 0 0 -2 0 0 1 1 0 -1 0 𝑦 1 + 𝑦 2 + 1 − 𝑦 3 ≥ 3 1 0 0 0 -2 0 1 0 1 0 -1 0 1 1 0 0 -1 0 Exercise : What would the linear 1 1 1 1 0 1 threshold function be if the conjunctions here were replaced Questions? with disjunctions? 9

  10. m-of-n functions m-of-n rules • There is a fixed set of n variables • y = true if, and only if, at least m of them are true • All other variables are ignored Suppose there are five Boolean variables: x 1 , x 2 , x 3, x 4 , x 5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x 1 , x 2 , x 3 }”? 10

  11. m-of-n functions m-of-n rules • There is a fixed set of n variables • y = true if, and only if, at least m of them are true • All other variables are ignored Suppose there are five Boolean variables: x 1 , x 2 , x 3, x 4 , x 5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x 1 , x 2 , x 3 }”? 𝑦 1 + 𝑦 2 + 𝑦 + ≥ 2 11

  12. m-of-n functions m-of-n rules • There is a fixed set of n variables • y = true if, and only if, at least m of them are true • All other variables are ignored Suppose there are five Boolean variables: x 1 , x 2 , x 3, x 4 , x 5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x 1 , x 2 , x 3 }”? 𝑦 1 + 𝑦 2 + 𝑦 + ≥ 2 Questions? 12

  13. Not all functions are linearly separable Parity is not linearly separable (The XOR function) Can’t draw a line to separate the two classes - - - - - - - - +++ + - - + - - - + - - + + - - x 1 - - - - - - - - - - +++ - - + - + - - - + + + - x 2 Questions? 13

  14. Not all functions are linearly separable • XOR is not linear – 𝑧 = 𝑦 XOR 𝑧 = 𝑦 ∧ ¬𝑧 ∨ (¬𝑦 ∧ 𝑧) – Parity cannot be represented as a linear classifier • f( x ) = 1 if the number of 1’s is even • Many non-trivial Boolean functions – Example: 𝑧 = 𝑦 , ∧ 𝑦 - ∨ 𝑦 + ∧ ¬𝑦 . – The function is not linear in the four variables 14

  15. Even these functions can be made linear These points are not separable in 1-dimension by a line What is a one-dimensional line, by the way? x The trick: Change the representation 15

  16. The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 16

  17. The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 2 x 17

  18. The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 2 (-2, 4) x -2 18

  19. The blown up feature space The trick: Use feature conjunctions Transform points: Represent each point x in 2 dimensions by (x, x 2 ) x 2 x Now the data is linearly separable in this space! 19

  20. Exercise How would you use the feature transformation idea to make XOR in two dimensions linearly separable in a new space? To answer this question, you need to think about a function that maps examples from two dimensional space to a higher dimensional space. 20

  21. Almost linearly separable data sgn(b +w 1 x 1 + w 2 x 2 ) Training data is almost separable, except for some noise How much noise do +++ we allow for? + - + + + x 1 + - - - - - - - - - - - - - - - - - x 2 21

  22. Almost linearly separable data sgn(b +w 1 x 1 + w 2 x 2 ) Training data is almost b +w 1 x 1 + w 2 x 2 =0 separable, except for some noise How much noise do +++ we allow for? + - + + + x 1 + - - - - - - - - - - - - - - - - - x 2 22

  23. Linear classifiers: An expressive hypothesis class • Many functions are linear • Often a good guess for a hypothesis space • Some functions are not linear – The XOR function – Non-trivial Boolean functions • But there are ways of making them linear in a higher dimensional feature space 23

  24. Why is the bias term needed? b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + x 1 - - - - - - - - - - - - -- - - - - x 2 24

  25. Why is the bias term needed? If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough +++ x 1 + + - - + + + - - - - - - - - - - -- - - - - x 2 25

  26. Why is the bias term needed? If b is zero, then we are restricting the learner only to hyperplanes that go w 1 x 1 + w 2 x 2 =0 through the origin May not be expressive enough +++ x 1 + + - - + + + - - - - - - - - - - -- - - - - x 2 26

  27. Exercises 1. Represent the simple disjunction as a linear classifier. 2. How would you apply the feature space expansion idea for the XOR function? 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend