the perceptron algorithm
play

The Perceptron Algorithm Machine Learning 1 Some slides based on - PowerPoint PPT Presentation

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Outline The Perceptron Algorithm Variants of Perceptron Perceptron Mistake Bound 2 Where are we? The Perceptron


  1. The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others

  2. Outline • The Perceptron Algorithm • Variants of Perceptron • Perceptron Mistake Bound 2

  3. Where are we? • The Perceptron Algorithm • Variants of Perceptron • Perceptron Mistake Bound 3

  4. Recall: Linear Classifiers Inputs are 𝑒 dimensional vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign(𝐱 ! 𝐲 + 𝑐) = sign(∑ " 𝑥 " 𝑦 " + 𝑐) 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 4

  5. Recall: Linear Classifiers Inputs are 𝑒 dimensional vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign(𝐱 ! 𝐲 + 𝑐) = sign(∑ " 𝑥 " 𝑦 " + 𝑐) sgn 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 ∑ 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑥 ! 𝑥 " 𝑥 # 𝑥 $ 𝑥 % 𝑥 & 𝑥 ' 𝑥 ( 𝑐 𝑦 ! 𝑦 " 𝑦 # 𝑦 $ 𝑦 % 𝑦 & 𝑦 ' 𝑦 ( 1 𝑐 is called the bias term 5

  6. The geometry of a linear classifier sgn(b +w 1 x 1 + w 2 x 2 ) We only care about the sign, not the magnitude b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - In higher dimensions, - - - a linear classifier -- - - represents a hyperplane - that separates the space - into two half-spaces x 2 6

  7. The Perceptron 7

  8. The Perceptron algorithm • Rosenblatt 1958 – (Though there were some hints of a similar idea earlier, eg: Agmon 1954) • The goal is to find a separating hyperplane – For separable data, guaranteed to find one • An online algorithm – Processes one example at a time • Several variants exist – We will see these briefly at towards the end 8

  9. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} 1. Initialize 𝐱 . = 0 ∈ ℜ - 2. For each training example 𝐲 " , 𝑧 " : 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 9

  10. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Remember: Prediction = sgn( w T x ) 1. Initialize 𝐱 . = 0 ∈ ℜ - There is typically a bias term 2. For each training example 𝐲 " , 𝑧 " : also ( w T x + b), but the bias 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) may be treated as a constant feature and folded 2. If y / ≠ 𝑧 " : into w • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 10

  11. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Remember: Prediction = sgn( w T x ) 1. Initialize 𝐱 . = 0 ∈ ℜ - There is typically a bias term 2. For each training example 𝐲 " , 𝑧 " : also ( w T x + b), but the bias 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) may be treated as a constant feature and folded 2. If y / ≠ 𝑧 " : into w • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector Footnote: For some algorithms it is mathematically easier to represent False as -1, and at other times, as 0. For the Perceptron algorithm, treat -1 as false and +1 as true. 11

  12. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - 2. For each training example 𝐲 " , 𝑧 " : 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 12

  13. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 13

  14. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 14

  15. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) , 𝐲 + ≤ 0 Mistake can be written as y + 𝐱 ) 3. Return final weight vector 15

  16. The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) , 𝐲 + ≤ 0 Mistake can be written as y + 𝐱 ) 3. Return final weight vector This is the simplest version. We will see more robust versions shortly 16

  17. Intuition behind the update Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Suppose we have made a mistake on a positive example " 𝐲 ≤ 0 That is, 𝑧 = +1 and 𝐱 ! Call the new weight vector 𝐱 !#$ = 𝐱 ! + 𝐲 (say r = 1) " 𝐲 = 𝐱 ! + 𝐲 " 𝐲 = 𝐱 ! " 𝐲 + 𝐲 𝐔 𝐲 ≥ 𝐱 𝐮 𝐔 𝐲 The new dot product is 𝐱 %#$ For a positive example, the Perceptron update will increase the score assigned to the same input Similar reasoning for negative examples 17

  18. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old 18

  19. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old (x, +1) 19

  20. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old (x, +1) For a mistake on a positive example 20

  21. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old (x, +1) (x, +1) For a mistake on a positive example 21

  22. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old y x (x, +1) (x, +1) For a mistake on a positive example 22

  23. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old y x (x, +1) (x, +1) For a mistake on a positive example 23

  24. Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update After 𝐱 ← 𝐱 + 𝑧𝐲 w old y x w new (x, +1) (x, +1) (x, +1) For a mistake on a positive example 24

  25. Geometry of the perceptron update Predict w old 25

  26. Geometry of the perceptron update Predict (x, -1) w old For a mistake on a negative example 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend