Perceptron Learning Algorithm Matthieu R. Bloch 1 A bit of - PDF document

1 separating hyperplane . (1) Tien, . . Figure 1: Ilustration of linearly separable dataset yourself. ECE 6254 - Spring 2020 - Lecture 9 v1.1 - revised February 4, 2020 Perceptron Learning Algorithm Matthieu R. Bloch 1 A bit of geometry Definition 1.1. Dataset { x i , y i } N i =1 is linearly separable if there exists w ∈ R d and b ∈ R such that ∀ i ∈ � 1 , N � y i = sign ( w ⊺ x + b ) y i ∈ {± 1 } By definition sign ( x ) = +1 if x > 0 and − 1 else. Tie affine set { x : w ⊺ x + b = 0 } is then called a As illustrated in Fig. 1, it is important to note that H ≜ { x : w ⊺ x + b = 0 } is not a vector space because of the presence of the offset b It is an affine space, meaning that it can be described as H = x 0 + V , where x 0 ∈ H and V is a vector space. Make sure that this is clear and check for x 2 H = { w ⊺ x + b = 0 } − b w 2 x 1 0 Lemma 1.2. Consider the hyperplane H ≜ { x : w ⊺ x + b = 0 } . Tie vector w is orthogonal to all vectors parallel to the hyperplane. For z ∈ R d , the distance of z to the hyperplane is d ( z , H ) = | w ⊺ z + b | ∥ w ∥ 2 Proof. Consider x , x ′ in H . Tien, by definition, w ⊺ x + b = 0 = w ⊺ x ′ + b so that w ⊺ ( x − x ′ ) = 0 . Hence, w is orthogonal to all vectors parallel to H . Consider now any point z ∈ R d and a point x 0 ∈ H . Tie distance of z to H is the distance between z and its orthogonal projection onto H , which we can compute as d ( z , H ) = | w ⊺ ( z − x 0 ) | ∥ w ∥ 2 | w ⊺ ( z − x 0 ) | = | w ⊺ z + b | . ■

2 (2) the loss function. We obtain (4) Tie Perceptron Learning Algorithm (PLA) was proposed by Rosenblatt to identify a separating (3) Consider a loss function called the “perceptron loss” defined as Gradient descent view of PLA Tie principle of the algorithm is the following. Figure 2: PLA update Geometric view of PLA ECE 6254 - Spring 2020 - Lecture 9 v1.1 - revised February 4, 2020 2 The Perceptron Learning Algorithm hyperplane in a linearly separarable dataset { ( x i , y i ) } N i =1 if it exist. We assume that every vector x ∈ R d +1 with x 0 = 1 , so that we can use the shorthand θ ⊺ x = 0 to describe a affine hyperplane. 1. Start from a guess θ (0) . 2. For j ⩾ 1 , iterate over the data points (in any order) and update θ ( j ) + y i x i if y i ̸ = sign ( ) { θ ( j ) ⊺ x i θ ( j +1) = θ ( j ) else Tie effect of the PLA update is illustrated in Fig. 2. Note that the update of θ ( j +1) not only changes the overall hyperplane, and not just the associated vector space. Tiis is best seen in Fig. 2, where the offset changes and not just the slope of the separator. ① ✷ ✭ ❥ ✮ ✦ ✄ ☎ ✰ ✆ ✝ ✂ ✁ ✄ ☎ ✰ ✆ ✝ ✞ ✟ ✭ ❥ ✮ ✇ � ✐ ✭ ❥ ✠ ✶ ✮ ✭ ❥ ✮ ✇ ❜ ① ✶ N ∑ ℓ ( θ ) ≜ max (0 , − y i θ ⊺ x i ) . i =1 Intuitively, the loss penalizes misclassified points (according to θ ) with a penalty proportional to how badly they are misclassified. Setting ℓ i ( θ ) ≜ max (0 , − y i θ ⊺ x i ) , we have  0 if y i θ ⊺ x i > 0   ∇ ℓ i ( θ ) = − y i x i if y i θ ⊺ x i < 0  [0 , 1] × − y i x i if θ ⊺ x i = 0  Tie case of equality θ ⊺ x i = 0 corresponds to the point where the loss function ℓ i ( θ ) is not differ- entiable. In such case, we have to use a subgradient of ℓ i at θ , which is any vector v such that for all θ ′ , ℓ i ( θ ) − ℓ i ( θ ′ ) ⩾ v ⊺ ( θ − θ ′ ) . A subgradient is not unique and the set of subgradients is usually denoted ∂ℓ i ( θ ) . Let us now apply a stochastic gradient descent algorithm with a step size of 1 to

3 [2] F. Rosenblatt, “Tie perceptron: A probabilistic model for information storage and organization descent that treats the case of subgradients with its own rule. Note that (5) is almost identical to (2). Tie PLA udpate rule is essentially a stochastic gradient (5) proposal of the PLA is in [2]. Introductory treatments of the PLA are found in [3, Section 4.5] and [4, Section 8.5.4]. [1] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint, Jun. 2017. in the brain,” Psychological Review , vol. 65, no. 6, pp. 386–408, 1958. PLA because of classification errors is bounded and the PLA eventually identifies a separating hyperplane. [3] T. Hastie, R. Tibshirani, and J. H. Friedman, Tie Elements of Statistical Learning: Data Mining, Inference, and Prediction , ser. Springer series in statistics. Springer, 2009. [4] K. P . Murphy, Machine Learning: A Probabilistic Perspective . MIT Press, 2012. A simple and accessible review of gradient descent techniques can be found in [1]. Tie original ECE 6254 - Spring 2020 - Lecture 9 v1.1 - revised February 4, 2020 1. Start from a guess θ (0) . 2. For j ⩾ 1 , iterate over the data points (in any order) and update θ ( j ) + y i x i if − y i θ ( j ) ⊺ x i > 0   θ ( j +1) = θ ( j ) − ∇ ℓ i ( θ ) =  θ ( j ) if − y i θ ( j ) ⊺ x i < 0 θ ( j ) − v where v ∈ ∂ℓ i ( θ ) if θ ⊺ x i = 0   Tieorem 2.1. Consider a linearly separable data set { ( x i , y i ) } N i =1 . Tie number of updates made by the 3 To go further References

Perceptron Learning Algorithm Matthieu R. Bloch 1 A bit of - PDF document

1 separating hyperplane . (1) Tien, . . Figure 1: Ilustration of linearly separable dataset yourself. ECE 6254 - Spring 2020 - Lecture 9 v1.1 - revised February 4, 2020 Perceptron Learning Algorithm Matthieu R. Bloch 1 A bit of geometry

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l

Iterative Closest Point (ICP) Algorithm. L 1 solution... Yaroslav Halchenko CS @ NJIT Iterative

On Optimal and Reasonable Control in the Presence of Adversaries Oded Maler CNRS-VERIMAG

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of

Edge Fabric: Delivering Oceans of Content to the World Brandon Schlinker 1,2 Hyojeong Kim 1 ,

Color superfluidity of neutral ultracold fermions in the presence of color-orbit and color-flip

Neutrino energy reconstruction in presence of missing energy ProtoDUNEs Science Workshop - Cern

t t R

Motivation Intra-procedural analysis depends upon accurate control-flow information. In the

Perceptron Learning Algorithm Matthieu R. Bloch 1 A bit of - PDF document

1 separating hyperplane . (1) Tien, . . Figure 1: Ilustration of linearly separable dataset yourself. ECE 6254 - Spring 2020 - Lecture 9 v1.1 - revised February 4, 2020 Perceptron Learning Algorithm Matthieu R. Bloch 1 A bit of geometry

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net &gt; 0, else 0) l

Iterative Closest Point (ICP) Algorithm. L 1 solution... Yaroslav Halchenko CS @ NJIT Iterative

On Optimal and Reasonable Control in the Presence of Adversaries Oded Maler CNRS-VERIMAG

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of

Edge Fabric: Delivering Oceans of Content to the World Brandon Schlinker 1,2 Hyojeong Kim 1 ,

Color superfluidity of neutral ultracold fermions in the presence of color-orbit and color-flip

Neutrino energy reconstruction in presence of missing energy ProtoDUNEs Science Workshop - Cern

t t R

Motivation Intra-procedural analysis depends upon accurate control-flow information. In the

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l