The Perceptron Algorithm Machine Learning 1 Some slides based on - PowerPoint PPT Presentation

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others

Outline • The Perceptron Algorithm • Variants of Perceptron • Perceptron Mistake Bound 2

Where are we? • The Perceptron Algorithm • Variants of Perceptron • Perceptron Mistake Bound 3

Recall: Linear Classifiers Inputs are 𝑒 dimensional vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign(𝐱 ! 𝐲 + 𝑐) = sign(∑ " 𝑥 " 𝑦 " + 𝑐) 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑐 is called the bias term 4

Recall: Linear Classifiers Inputs are 𝑒 dimensional vectors, denoted by 𝐲 Output is a label 𝑧 ∈ {−1, 1} Linear Threshold Units classify an example 𝐲 using parameters 𝐱 (a 𝑒 dimensional vector) and 𝑐 (a real number) according the following classification rule Output = sign(𝐱 ! 𝐲 + 𝑐) = sign(∑ " 𝑥 " 𝑦 " + 𝑐) sgn 𝐱 ! 𝐲 + 𝑐 ≥ 0 ⇒ 𝑧 = +1 ∑ 𝐱 ! 𝐲 + 𝑐 < 0 ⇒ 𝑧 = −1 𝑥 ! 𝑥 " 𝑥 # 𝑥 $ 𝑥 % 𝑥 & 𝑥 ' 𝑥 ( 𝑐 𝑦 ! 𝑦 " 𝑦 # 𝑦 $ 𝑦 % 𝑦 & 𝑦 ' 𝑦 ( 1 𝑐 is called the bias term 5

The geometry of a linear classifier sgn(b +w 1 x 1 + w 2 x 2 ) We only care about the sign, not the magnitude b +w 1 x 1 + w 2 x 2 =0 +++ + + + + + [w 1 w 2 ] x 1 - - - - - - - - - In higher dimensions, - - - a linear classifier -- - - represents a hyperplane - that separates the space - into two half-spaces x 2 6

The Perceptron 7

The Perceptron algorithm • Rosenblatt 1958 – (Though there were some hints of a similar idea earlier, eg: Agmon 1954) • The goal is to find a separating hyperplane – For separable data, guaranteed to find one • An online algorithm – Processes one example at a time • Several variants exist – We will see these briefly at towards the end 8

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} 1. Initialize 𝐱 . = 0 ∈ ℜ - 2. For each training example 𝐲 " , 𝑧 " : 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 9

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Remember: Prediction = sgn( w T x ) 1. Initialize 𝐱 . = 0 ∈ ℜ - There is typically a bias term 2. For each training example 𝐲 " , 𝑧 " : also ( w T x + b), but the bias 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) may be treated as a constant feature and folded 2. If y / ≠ 𝑧 " : into w • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 10

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Remember: Prediction = sgn( w T x ) 1. Initialize 𝐱 . = 0 ∈ ℜ - There is typically a bias term 2. For each training example 𝐲 " , 𝑧 " : also ( w T x + b), but the bias 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) may be treated as a constant feature and folded 2. If y / ≠ 𝑧 " : into w • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector Footnote: For some algorithms it is mathematically easier to represent False as -1, and at other times, as 0. For the Perceptron algorithm, treat -1 as false and +1 as true. 11

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - 2. For each training example 𝐲 " , 𝑧 " : 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 12

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) 2. If y / ≠ 𝑧 " : • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 13

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) 3. Return final weight vector 14

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) , 𝐲 + ≤ 0 Mistake can be written as y + 𝐱 ) 3. Return final weight vector 15

The Perceptron algorithm Input: A sequence of training examples 𝐲 + , 𝑧 + , 𝐲 , , 𝑧 , , ⋯ where all 𝐲 " ∈ ℜ - , 𝑧 " ∈ {−1, 1} Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + 1. Initialize 𝐱 . = 0 ∈ ℜ - r is the learning rate, a small positive 2. For each training example 𝐲 " , 𝑧 " : number less than 1 1. Predict y / = sgn(𝐱 0 1 𝐲 " ) Update only on error. A mistake-driven 2. If y / ≠ 𝑧 " : algorithm • Update 𝐱 02+ ← 𝐱 0 + 𝑠(𝑧 " 𝐲 " ) , 𝐲 + ≤ 0 Mistake can be written as y + 𝐱 ) 3. Return final weight vector This is the simplest version. We will see more robust versions shortly 16

Intuition behind the update Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Suppose we have made a mistake on a positive example " 𝐲 ≤ 0 That is, 𝑧 = +1 and 𝐱 ! Call the new weight vector 𝐱 !#$ = 𝐱 ! + 𝐲 (say r = 1) " 𝐲 = 𝐱 ! + 𝐲 " 𝐲 = 𝐱 ! " 𝐲 + 𝐲 𝐔 𝐲 ≥ 𝐱 𝐮 𝐔 𝐲 The new dot product is 𝐱 %#$ For a positive example, the Perceptron update will increase the score assigned to the same input Similar reasoning for negative examples 17

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old 18

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old (x, +1) 19

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict w old (x, +1) For a mistake on a positive example 20

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old (x, +1) (x, +1) For a mistake on a positive example 21

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old y x (x, +1) (x, +1) For a mistake on a positive example 22

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update 𝐱 ← 𝐱 + 𝑧𝐲 w old y x (x, +1) (x, +1) For a mistake on a positive example 23

Mistake on positive: 𝐱 )*! ← 𝐱 ) + 𝑠𝐲 + Mistake on negative: 𝐱 )*! ← 𝐱 ) − 𝑠𝐲 + Geometry of the perceptron update Predict Update After 𝐱 ← 𝐱 + 𝑧𝐲 w old y x w new (x, +1) (x, +1) (x, +1) For a mistake on a positive example 24

Geometry of the perceptron update Predict w old 25

Geometry of the perceptron update Predict (x, -1) w old For a mistake on a negative example 26

The Perceptron Algorithm Machine Learning 1 Some slides based on - PowerPoint PPT Presentation

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Outline The Perceptron Algorithm Variants of Perceptron Perceptron Mistake Bound 2 Where are we? The Perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

TACKLING BIG-IP BLUE-GREEN DEPLOYMENTS IN PRIVATE CLOUD USING F5 & VMWARE ANSIBLE MODULES

Multiclass Predictions CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T opics Given an arbitrary

Extending Binary Linear Classification One-Versus-All Classification (OVA) } In the presence of

Software Vulnerability Handling and practical incident recognition Idea: ( ) Sven Gabriel

GrayLog for Java developers Track Monitoring & Cloud Jos Manuel Ortega @jmortegac Agenda

From Binary to Extreme Classification Matt Gormley Lecture 2 Aug. 28, 2019 1 Q&A Q: How

I have nothing to disclose. Stefanie M. Ueda, M.D. Assistant Clinical Professor Division of

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

The Perceptron Algorithm Machine Learning 1 Some slides based on - PowerPoint PPT Presentation

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim Blum and others Outline The Perceptron Algorithm Variants of Perceptron Perceptron Mistake Bound 2 Where are we? The Perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net &gt; 0, else 0) l

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

TACKLING BIG-IP BLUE-GREEN DEPLOYMENTS IN PRIVATE CLOUD USING F5 &amp; VMWARE ANSIBLE MODULES

Multiclass Predictions CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T opics Given an arbitrary

Extending Binary Linear Classification One-Versus-All Classification (OVA) } In the presence of

Software Vulnerability Handling and practical incident recognition Idea: ( ) Sven Gabriel

GrayLog for Java developers Track Monitoring &amp; Cloud Jos Manuel Ortega @jmortegac Agenda

From Binary to Extreme Classification Matt Gormley Lecture 2 Aug. 28, 2019 1 Q&amp;A Q: How

I have nothing to disclose. Stefanie M. Ueda, M.D. Assistant Clinical Professor Division of

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l

TACKLING BIG-IP BLUE-GREEN DEPLOYMENTS IN PRIVATE CLOUD USING F5 & VMWARE ANSIBLE MODULES

GrayLog for Java developers Track Monitoring & Cloud Jos Manuel Ortega @jmortegac Agenda

From Binary to Extreme Classification Matt Gormley Lecture 2 Aug. 28, 2019 1 Q&A Q: How