Perceptrons From the heights of error, To the valleys of Truth - PowerPoint PPT Presentation

Perceptrons “From the heights of error, To the valleys of Truth” Piyush Kumar Advanced Computational Geometry

Reading Material � Duda/Hart/Stork : 5.4/5.5/9.6.8 � Any neural network book (Haykin, Anderson…) � Look at papers of related people � Santosh Vempala � A. Blum � J. Dunagan � F. Rosenblatt � T. Bylander

Introduction � Supervised Learning Input Output Pattern Pattern Compare and Correct if necessary

Linear discriminant functions � Definition It is a function that is a linear combination of the components of x g(x) = w t x+ w 0 (1) where w is the weight vector and w 0 the bias A two-category classifier with a discriminant function of the form (1) uses � the following rule: Decide ω 1 if g(x) > 0 and ω 2 if g(x) < 0 ⇔ Decide ω 1 if w t x > -w 0 and ω 2 otherwise If g(x) = 0 ⇒ x is assigned to either class

LDFs � The equation g(x) = 0 defines the decision surface that separates points assigned to the category ω 1 from points assigned to the category ω 2 � When g(x) is linear, the decision surface is a hyperplane

Classification using LDFs � Two main approaches � Fischer’s Linear Discriminant Project data onto a line with ‘good’ discrimination; then classify on the real line � Linear Discrimination in d-dimensions Classify data using suitable hyperplanes. (We’ll use perceptrons to construct these)

Perceptron: The first NN � Proposed by Frank Rosenblatt in 1956 � Neural net researchers accuse Rosenblatt of promising ‘too much’ ☺ � Numerous variants � We’ll cover the one that’s most geometric to explain ☺ � One of the simplest Neural Network.

Perceptrons : A Picture ⎧ n ∑ > ⎪ 1 if 0 w i x = i ⎨ y = i 0 ⎪ − ⎩ 1 otherwise And correct Compare + 1 -1 w 0 w n w 1 w 2 w 3 x 0 =-1 x 1 x 2 x 3 . . . x n

I s t hi s uni que? Class 2 : (-1) Where is the geometry? Class 1 : (+ 1)

Assumption � Lets assume for this talk that the red and green points in ‘feature space’ are separable using a hyperplane. Two Cat egor y Li near l y separ abl e case

Whatz the problem? � Why not just take out the convex hull of one of the sets and find one of the ‘right’ facets? � Because its too much work to do in d- dimensions. � What else can we do? � Linear programming = = Perceptrons � Quadratic Programming = = SVMs

Perceptrons � Aka Learning Half Spaces � Can be solved in polynomial time using IP algorithms. � Can also be solved using a simple and elegant greedy algorithm (Which I present today)

In Math notation r r r {( , ), ( , ),..., ( , )) x y x y x n y N samples : 1 1 2 2 n r ∈ d x R Where y = + /- 1 are labels for the data. w r r . = 0 x Can we find a hyperplane that separates the two classes? (labeled by y) i.e. r r > . 0 x j w : For all j such that y = + 1 r r < . 0 x j w : For all j such that y = -1

W hi ch we wi l l r el ax l at er ! Further assumption 1 Lets assume that the hyperplane that we are looking for passes thru the origin

Rel ax now! ! ☺ Further assumption 2 � Lets assume that we are looking for a halfspace that contains a set of points

Lets Relax FA 1 now � “Homogenize” the coordinates by adding a new coordinate to the input. � Think of it as moving the whole red and blue points in one higher dimension � From 2D to 3D it is just the x-y plane shifted to z = 1. This takes care of the “bias” or our assumption that the halfspace can pass thru the origin.

Rel ax now! ☺ Further Assumption 3 � Assume all points on a unit sphere! � If they are not after applying transformations for FA 1 and FA 2 , make them so.

Restatement 1 � Given: A set of points on a sphere in d-dimensions, such that all of them lie in a half-space. � Output: Find one such halfspace � Note: You can solve the LP feasibility problem. ⇔ You can solve any general LP !! Take Est i e’ s cl ass i f you ant t o know why. ☺ W

Restatement 2 � Given a convex body (in V-form), find a halfspace passing thru the origin that contains it.

Support Vector Machines A small break from perceptrons

Support Vector Machines • Li near Lear ni ng M achi nes l i ke per cept r ons. • M ap non- l i near l y t o hi gher di m ensi on t o over com e t he l i near i t y const r ai nt . • Sel ect bet ween hyper pl anes, Use m ar gi n as a t est ( Thi s i s what per cept r ons don’ t do) Fr om l ear ni ng t heor y, m axi m um m ar gi n i s good

ar gi n M SVMs

Another Reformulation Unl i ke Per cept r ons SVM s have a uni que sol ut i on but ar e har der t o sol ve. <Q P>

Support Vector Machines � There are very simple algorithms to solve SVMs ( as simple as perceptrons ) ( If there is enough demand, I can try to cover it ) ( and If my job hunting lets me ;) )

Back to perceptrons

Perceptrons � So how do we solve the LP ? � Simplex � Ellipsoid � IP methods � Perceptrons = Gradient Decent So we could solve the classification problem using any LP method.

Why learn Perceptrons? � You can write an LP solver in 5 mins ! � A very slight modification can give u a polynomial time guarantee (Using smoothed analysis)!

Why learn Perceptrons � Multiple perceptrons clubbed together are used to learn almost anything in practice. (Idea behind multi layer neural networks) � Perceptrons have a finite capacity and so cannot represent all classifications. The amount of training data required will need to be larger than the capacity. We’ll talk about capacity when we introduce VC-dimension. Fr om l ear ni ng t heor y, l i m i t ed capaci t y i s good

Another twist : Linearization � If the data is separable with say a sphere, how would you use a perceptron to separate it? (Ellipsoids?)

Del aunay! ?? Linearization Li f t t he poi nt s t o a par abol oi d i n one hi gher di m ensi on, For i nst ance i f t he dat a i s i n 2D, ( x, y) - > ( x, y, x 2 +y 2 )

The kernel Matrix � Another trick that ML community uses for Linearization is to use a function that redefines distances between points. − − 2 σ = || || /2 x z ( , ) K x z e � Example : � There are even papers on how to learn kernels from data !

Perceptron Smoothed Complexity Let L be a l i near pr ogr am and l et L’ be t he sam e l i near pr ogr am under a G aussi an a 2 , wher e si gm a 2 <= per t ur bat i on of var i ance si gm 1/ 2d. For any del t a, wi t h pr obabi l i t y at l east 1 – del t a ei t her The per cept r on f i nds a f easi bl e sol ut i on i n pol y( d, m , 1/ si gm a, 1/ del t a) L’ i s i nf easi bl e or unbounded

In one line The Algorithm

The 1 Line LP Solver! � Start with a random vector w, and if a point is misclassified do: r r r = + w w x + 1 k k k ( unt i l done) One of the most beautiful LP Solvers I’ve ever come across…

A better description I ni t i al i ze w=0, i =0 do i = ( i +1) m od n i f x i i s m i scl assi f i ed by w t hen w = w + x i unt i l al l pat t er ns cl assi f i ed Ret ur n w

That ’ s t he ent i r e code! W r i t t en i n 10 m i ns. An even better description f unct i on w = per cept r on( r , b) r = [ r ( zer os( l engt h( r ) , 1) +1) ] ; % Hom ogeni ze b = - [ b ( zer os( l engt h( b) , 1) +1) ] ; % Hom ogeni ze and f l i p dat a = [ r ; b] ; % M ake one poi nt set s = si ze( dat a) ; % Si ze of dat a? w = zer os( 1, s( 1, 2) ) ; % I ni t i al i ze zer o vect or i s_er r or = t r ue; whi l e i s_er r or i s_er r or = f al se; f or k=1: s( 1, 1) i f dot ( w, dat a( k, : ) ) <= 0 w = w+dat a( k, : ) ; i s_er r or = t r ue; end end end And i t can be sol ve any LP!

An output

In other words At each step, the algorithm picks any vector x that is misclassified, or is on the wrong side of the halfspace, and brings the normal vector w closer into agreement with that point

The m at h behi nd… Still: Why the hell does it work? Back to the most advanced presentation tools available on earth ! The blackboard ☺ Wait (Lemme try the whiteboard) The Conver gence Pr oof

That’s all folks ☺

Perceptrons From the heights of error, To the valleys of Truth - PowerPoint PPT Presentation

Perceptrons From the heights of error, To the valleys of Truth Piyush Kumar Advanced Computational Geometry Reading Material Duda/Hart/Stork : 5.4/5.5/9.6.8 Any neural network book (Haykin, Anderson) Look at papers of

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Kalev

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Perceptrons Barna Saha The Machine Learning Model Training set: A training set consists of a

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

Linear Classifiers CS 188: Artificial Intelligence Perceptrons and Logistic Regression Pieter

CS485/685 Lecture 7: Jan 24, 2012 Perceptrons, Neural Networks [B]: Sections 4.1.7, 5.1 CS485/685

Intro to Artificial Neural Networks Oscar Maas @oscmansan Outline 1. Perceptrons 2.

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

the Company of Heaven Gods Invisible Creatures Angels are ministering spirits (Hebrews

Wireless Networks Lecture 20 : Managing Wireless Networks Peter Steenkiste CS and ECE, Carnegie

Probabilistic modeling Subhransu Maji CMPSCI 689: Machine Learning 3 March 2015 5 March 2015

Job 36:4 Be assured that my words are not false; one perfect in knowledge is with you. Job

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Unfolding and Shrinking Neural Machine Translation Ensembles Felix Stahlberg and Bill Byrne

Tcl Values: Past, Present & Tales from the Future 2016 Tcl Conference Don Porter Tcl/Tk

Nuclear effects in high energy lepton-nucleus scattering Vadim Guzey Theory Center, Jefferson

Perceptrons From the heights of error, To the valleys of Truth - PowerPoint PPT Presentation

Perceptrons From the heights of error, To the valleys of Truth Piyush Kumar Advanced Computational Geometry Reading Material Duda/Hart/Stork : 5.4/5.5/9.6.8 Any neural network book (Haykin, Anderson) Look at papers of

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Machine Learning and Data Mining Multi-layer Perceptrons &amp; Neural Networks: Basics Kalev

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Perceptrons Barna Saha The Machine Learning Model Training set: A training set consists of a

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

Machine Learning &amp; Neural Networks CS16: Introduction to Data Structures &amp; Algorithms

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

Linear Classifiers CS 188: Artificial Intelligence Perceptrons and Logistic Regression Pieter

CS485/685 Lecture 7: Jan 24, 2012 Perceptrons, Neural Networks [B]: Sections 4.1.7, 5.1 CS485/685

Intro to Artificial Neural Networks Oscar Maas @oscmansan Outline 1. Perceptrons 2.

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

the Company of Heaven Gods Invisible Creatures Angels are ministering spirits (Hebrews

Wireless Networks Lecture 20 : Managing Wireless Networks Peter Steenkiste CS and ECE, Carnegie

Probabilistic modeling Subhransu Maji CMPSCI 689: Machine Learning 3 March 2015 5 March 2015

Job 36:4 Be assured that my words are not false; one perfect in knowledge is with you. Job

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Unfolding and Shrinking Neural Machine Translation Ensembles Felix Stahlberg and Bill Byrne

Tcl Values: Past, Present &amp; Tales from the Future 2016 Tcl Conference Don Porter Tcl/Tk

Nuclear effects in high energy lepton-nucleus scattering Vadim Guzey Theory Center, Jefferson

Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Kalev

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms

Tcl Values: Past, Present & Tales from the Future 2016 Tcl Conference Don Porter Tcl/Tk