Linear Classifiers: Expressiveness Machine Learning 1 Lecture - - PowerPoint PPT Presentation

linear classifiers expressiveness
SMART_READER_LITE
LIVE PREVIEW

Linear Classifiers: Expressiveness Machine Learning 1 Lecture - - PowerPoint PPT Presentation

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models: Introduction What functions do linear classifiers express? 2 Where are we? Linear models: Introduction What functions do linear classifiers


slide-1
SLIDE 1

Machine Learning

Linear Classifiers: Expressiveness

1

slide-2
SLIDE 2

Lecture outline

  • Linear models: Introduction
  • What functions do linear classifiers express?

2

slide-3
SLIDE 3

Where are we?

  • Linear models: Introduction
  • What functions do linear classifiers express?

– Conjunctions and disjunctions – m-of-n functions – Not all functions are linearly separable – Feature space transformations – Exercises

3

slide-4
SLIDE 4

Which Boolean functions can linear classifiers represent?

  • Linear classifiers are an expressive hypothesis class
  • Many Boolean functions are linearly separable

– Not all though – Recall: In comparison, decision trees can represent any Boolean function

4

slide-5
SLIDE 5

Conjunctions and disjunctions

𝑧 = 𝑦! ∧ 𝑦" ∧ 𝑦# is equivalent to “𝑧 = 1 whenever 𝑦1 + 𝑦2 + 𝑦3 ≥ 3”

5

x1 x2 x3 y 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-6
SLIDE 6

Conjunctions and disjunctions

𝑧 = 𝑦! ∧ 𝑦" ∧ 𝑦# is equivalent to “𝑧 = 1 whenever 𝑦1 + 𝑦2 + 𝑦3 ≥ 3”

6

x1 x2 x3 y x1 + x2 + x3 – 3 sign

  • 3

1

  • 2

1

  • 2

1 1

  • 1

1

  • 2

1 1

  • 1

1 1

  • 1

1 1 1 1 1

slide-7
SLIDE 7

Conjunctions and disjunctions

𝑧 = 𝑦! ∧ 𝑦" ∧ 𝑦# is equivalent to “𝑧 = 1 whenever 𝑦1 + 𝑦2 + 𝑦3 ≥ 3”

7

x1 x2 x3 y x1 + x2 + x3 – 3 sign

  • 3

1

  • 2

1

  • 2

1 1

  • 1

1

  • 2

1 1

  • 1

1 1

  • 1

1 1 1 1 1 Negations are okay too. In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 𝑧 = 𝑦! ∧ 𝑦" ∧ ¬𝑦# corresponds to 𝑦1 + 𝑦2 + 1 − 𝑦3 ≥ 3

slide-8
SLIDE 8

Conjunctions and disjunctions

𝑧 = 𝑦! ∧ 𝑦" ∧ 𝑦# is equivalent to “𝑧 = 1 whenever 𝑦1 + 𝑦2 + 𝑦3 ≥ 3”

8

x1 x2 x3 y x1 + x2 + x3 – 3 sign

  • 3

1

  • 2

1

  • 2

1 1

  • 1

1

  • 2

1 1

  • 1

1 1

  • 1

1 1 1 1 1 Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions? Negations are okay too. In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 𝑧 = 𝑦! ∧ 𝑦" ∧ ¬𝑦# corresponds to 𝑦1 + 𝑦2 + 1 − 𝑦3 ≥ 3

slide-9
SLIDE 9

Conjunctions and disjunctions

𝑧 = 𝑦! ∧ 𝑦" ∧ 𝑦# is equivalent to “𝑧 = 1 whenever 𝑦1 + 𝑦2 + 𝑦3 ≥ 3”

9

x1 x2 x3 y x1 + x2 + x3 – 3 sign

  • 3

1

  • 2

1

  • 2

1 1

  • 1

1

  • 2

1 1

  • 1

1 1

  • 1

1 1 1 1 1 Exercise: What would the linear threshold function be if the conjunctions here were replaced with disjunctions? Negations are okay too. In general, use 1 − 𝑦 in the linear threshold unit if 𝑦 is negated 𝑧 = 𝑦! ∧ 𝑦" ∧ ¬𝑦# corresponds to 𝑦1 + 𝑦2 + 1 − 𝑦3 ≥ 3 Questions?

slide-10
SLIDE 10

m-of-n functions

m-of-n rules

  • There is a fixed set of n variables
  • y = true if, and only if, at least m of them are true
  • All other variables are ignored

Suppose there are five Boolean variables: x1, x2, x3, x4, x5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x1, x2, x3}”?

10

slide-11
SLIDE 11

m-of-n functions

m-of-n rules

  • There is a fixed set of n variables
  • y = true if, and only if, at least m of them are true
  • All other variables are ignored

Suppose there are five Boolean variables: x1, x2, x3, x4, x5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x1, x2, x3}”? 𝑦1 + 𝑦2 + 𝑦+ ≥ 2

11

slide-12
SLIDE 12

m-of-n functions

m-of-n rules

  • There is a fixed set of n variables
  • y = true if, and only if, at least m of them are true
  • All other variables are ignored

Suppose there are five Boolean variables: x1, x2, x3, x4, x5 What is a linear threshold unit that is equivalent to the classification rule “at least 2 of {x1, x2, x3}”? 𝑦1 + 𝑦2 + 𝑦+ ≥ 2

12

Questions?

slide-13
SLIDE 13

Parity is not linearly separable

13

+ + + + +++ +

  • -
  • -
  • -
  • x1

x2 + + + + +++ +

Can’t draw a line to separate the two classes Questions?

Not all functions are linearly separable

(The XOR function)

  • -
  • -
  • -
slide-14
SLIDE 14

Not all functions are linearly separable

  • XOR is not linear

– 𝑧 = 𝑦 XOR 𝑧 = 𝑦 ∧ ¬𝑧 ∨ (¬𝑦 ∧ 𝑧) – Parity cannot be represented as a linear classifier

  • f(x) = 1 if the number of 1’s is even
  • Many non-trivial Boolean functions

– Example: 𝑧 = 𝑦, ∧ 𝑦- ∨ 𝑦+ ∧ ¬𝑦. – The function is not linear in the four variables

14

slide-15
SLIDE 15

Even these functions can be made linear

The trick: Change the representation

15

These points are not separable in 1-dimension by a line What is a one-dimensional line, by the way?

x

slide-16
SLIDE 16

The blown up feature space

The trick: Use feature conjunctions

16

Transform points: Represent each point x in 2 dimensions by (x, x2)

x

slide-17
SLIDE 17

The blown up feature space

The trick: Use feature conjunctions

17

Transform points: Represent each point x in 2 dimensions by (x, x2)

x x2

slide-18
SLIDE 18

The blown up feature space

The trick: Use feature conjunctions

18

Transform points: Represent each point x in 2 dimensions by (x, x2)

  • 2

(-2, 4) x x2

slide-19
SLIDE 19

The blown up feature space

The trick: Use feature conjunctions

19

Transform points: Represent each point x in 2 dimensions by (x, x2) Now the data is linearly separable in this space!

x x2

slide-20
SLIDE 20

Exercise

How would you use the feature transformation idea to make XOR in two dimensions linearly separable in a new space?

To answer this question, you need to think about a function that maps examples from two dimensional space to a higher dimensional space.

20

slide-21
SLIDE 21

Almost linearly separable data

21

+ + + + +++ +

  • -
  • -
  • sgn(b +w1 x1 + w2x2)

x1 x2

Training data is almost separable, except for some noise How much noise do we allow for?

slide-22
SLIDE 22

Almost linearly separable data

22

+ + + + +++ +

  • -
  • -
  • sgn(b +w1 x1 + w2x2)

x1 x2

b +w1 x1 + w2x2=0 Training data is almost separable, except for some noise How much noise do we allow for?

slide-23
SLIDE 23

Linear classifiers: An expressive hypothesis class

  • Many functions are linear
  • Often a good guess for a hypothesis space
  • Some functions are not linear

– The XOR function – Non-trivial Boolean functions

  • But there are ways of making them linear in a higher

dimensional feature space

23

slide-24
SLIDE 24

Why is the bias term needed?

24

x1 x2 + + + + +++ +

  • -
  • -
  • -
  • b +w1 x1 + w2x2=0
slide-25
SLIDE 25

Why is the bias term needed?

25

x1 x2 + + + + +++ +

  • -
  • -
  • -
  • If b is zero, then we are

restricting the learner only to hyperplanes that go through the origin May not be expressive enough

slide-26
SLIDE 26

Why is the bias term needed?

26

x1 x2 + + + + +++ +

  • -
  • -
  • -
  • w1 x1 + w2x2=0

If b is zero, then we are restricting the learner only to hyperplanes that go through the origin May not be expressive enough

slide-27
SLIDE 27

Exercises

1. Represent the simple disjunction as a linear classifier. 2. How would you apply the feature space expansion idea for the XOR function?

27