Learning From Data Lecture 2 The Perceptron The Learning Setup A - PowerPoint PPT Presentation

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100

recap: The Plan 1. What is Learning? 2. Can We do it? 3. How to do it? concepts 4. How to do it well? theory practice 5. General principles? 6. Advanced techniques. 7. Other Learning Paradigms. our language will be mathematics . . . . . . our sword will be computer algorithms M The Perceptron : 2 /25 � A c L Creator: Malik Magdon-Ismail Recap: key players − →

recap: The Key Players input x ∈ R d = X . • Salary, debt, years in residence, . . . • Approve credit or not output y ∈ {− 1 , +1 } = Y . • True relationship between x and y target function f : X �→ Y . (The target f is unknown .) • Data on customers data set D = ( x 1 , y 1 ) , . . . , ( x N , y N ). ( y n = f ( x n ) .) X Y and D are given by the learning problem; The target f is fixed but unknown. We learn the function f from the data D . M The Perceptron : 3 /25 � A c L Creator: Malik Magdon-Ismail Recap: learning setup − →

recap: Summary of the Learning Setup UNKNOWN TARGET FUNCTION f : X �→ Y (ideal credit approval formula) y n = f ( x n ) TRAINING EXAMPLES ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) (historical records of credit customers) FINAL LEARNING HYPOTHESIS ALGORITHM g ≈ f A (learned credit approval formula) HYPOTHESIS SET H (set of candidate formulas) M The Perceptron : 4 /25 � A c L Creator: Malik Magdon-Ismail Simple learning model − →

A Simple Learning Model • Input vector x = [ x 1 , . . . , x d ] t . • Give importance weights to the different inputs and compute a “Credit Score” d � “Credit Score” = w i x i . i =1 • Approve credit if the “Credit Score” is acceptable. d � Approve credit if w i x i > threshold , (“Credit Score” is good) i =1 d � Deny credit if w i x i < threshold . (“Credit Score” is bad) i =1 • How to choose the importance weights w i input x i is important = ⇒ large weight | w i | input x i beneficial for credit = ⇒ positive weight w i > 0 input x i detrimental for credit = ⇒ negative weight w i < 0 M The Perceptron : 5 /25 � A c L Creator: Malik Magdon-Ismail Rewriting the model − →

A Simple Learning Model d � Approve credit if w i x i > threshold , i =1 d � Deny credit if w i x i < threshold . i =1 can be written formally as �� d � � � h ( x ) = sign + w 0 w i x i i =1 The “bias weight” w 0 corresponds to the threshold. (How?) M The Perceptron : 6 /25 � A c L Creator: Malik Magdon-Ismail Perceptron − →

The Perceptron Hypothesis Set We have defined a Hyopthesis set H H = { h ( x ) = sign( w t x ) } ← uncountably infinite H     1 w 0 w 1 x 1  ∈ R d +1 ,  ∈ { 1 } × R d .     w = x = . . . .     . .   w d x d This hypothesis set is called the perceptron or linear separator M The Perceptron : 7 /25 � A c L Creator: Malik Magdon-Ismail Geometry of perceptron − →

Geometry of The Perceptron h ( x ) = sign( w t x ) (Problem 1.2 in LFD) Income Income Age Age Which one should we pick? M The Perceptron : 8 /25 � A c L Creator: Malik Magdon-Ismail Use the data − →

Use the Data to Pick a Line Income Income Age Age A perceptron fits the data by using a line to separate the +1 from − 1 data. Fitting the data: How to find a hyperplane that separates the data? (“It’s obvious - just look at the data and draw the line,” is not a valid solution.) M The Perceptron : 9 /25 � A c L Creator: Malik Magdon-Ismail How to learn g − →

How to Learn a Final Hypothesis g from H We want to select g ∈ H so that g ≈ f . We certainly want g ≈ f on the data set D . Ideally, g ( x n ) = y n . How do we find such a g in the infinite hypothesis set H , if it exists? Idea! Start with some weight vector and try to improve it. Income Age M The Perceptron : 10 /25 � A c L Creator: Malik Magdon-Ismail PLA − →

The Perceptron Learning Algorithm (PLA) A simple iterative method. y ∗ = +1 1: w (1) = 0 2: for iteration t = 1 , 2 , 3 , . . . y ∗ x ∗ w ( t + 1) 3: the weight vector is w ( t ). w ( t ) 4: From ( x 1 , y 1 ) , . . . , ( x N , y N ) pick any misclassified example. x ∗ 5: Call the misclassified example ( x ∗ , y ∗ ), sign ( w ( t ) • x ∗ ) � = y ∗ . y ∗ = − 1 6: Update the weight: w ( t + 1) = w ( t ) + y ∗ x ∗ . y ∗ x ∗ w ( t ) w ( t + 1) 7: t ← t + 1 x ∗ PLA implements our idea: start at some weights and try to improve. “incremental learning”on a single example at a time M The Perceptron : 11 /25 � A c L Creator: Malik Magdon-Ismail PLA convergence − →

Does PLA Work? Theorem. If the data can be fit by a linear separator, then after some finite number of steps, PLA will find one. Income What if the data cannot be fit by a perceptron? Age iteration 1 M The Perceptron : 12 /25 � A c L Creator: Malik Magdon-Ismail Start − →

Does PLA Work? Theorem. If the data can be fit by a linear separator, then after some finite number of steps, PLA will find one. Income What if the data cannot be fit by a perceptron? Age iteration 1 M The Perceptron : 13 /25 � A c L Creator: Malik Magdon-Ismail Iteration 1 − →

Does PLA Work? Theorem. If the data can be fit by a linear separator, then after some finite number of steps, PLA will find one. After how long? Income What if the data cannot be fit by a perceptron? Age iteration 1 M The Perceptron : 14 /25 � A c L Creator: Malik Magdon-Ismail Iteratrion 2 − →

Does PLA Work? Theorem. If the data can be fit by a linear separator, then after some finite number of steps, PLA will find one. After how long? Income What if the data cannot be fit by a perceptron? Age iteration 2 M The Perceptron : 15 /25 � A c L Creator: Malik Magdon-Ismail Iteration 3 − →

Does PLA Work? Theorem. If the data can be fit by a linear separator, then after some finite number of steps, PLA will find one. After how long? Income What if the data cannot be fit by a perceptron? Age iteration 6 M The Perceptron : 19 /25 � A c L Creator: Malik Magdon-Ismail Non-separable data? − →

Does PLA Work? Theorem. If the data can be fit by a linear separator, then after some finite number of steps, PLA will find one. After how long? Income What if the data cannot be fit by a perceptron? Age iteration 1 M The Perceptron : 20 /25 � A c L Creator: Malik Magdon-Ismail We can fit! − →

We can Fit the Data • We can find an h that works from infinitely many (for the perceptron). (So computationally, things seem good.) • Ultimately, remember that we want to predict . We don’t care about the data, we care about “outside the data” . Can a limited data set reveal enough information to pin down an entire target function, so that we can predict outside the data? M The Perceptron : 21 /25 � A c L Creator: Malik Magdon-Ismail Other views of learning − →

Other Views of Learning • Design: learning is from data, design is from specs and a model. • Statistics, Function Approximation. • Data Mining: find patterns in massive data (typically unsupervised). • Three Learning Paradigms – Supervised: the data is ( x n , f ( x n )) – you are told the answer. – Reinforcement: you get feedback on potential answers you try: x → try something → get feedback . – Unsupervised: only given x n , learn to “organize” the data. M The Perceptron : 22 /25 � A c L Creator: Malik Magdon-Ismail Coins – supervised − →

Supervised Learning - Classifying Coins 25 25 5 Mass Mass 5 1 1 10 10 Size Size M The Perceptron : 23 /25 � A c L Creator: Malik Magdon-Ismail Coins – unsupervised − →

Unsupervised Learning - Categorizing Coins type 4 Mass Mass type 3 type 2 type 1 Size Size M The Perceptron : 24 /25 � A c L Creator: Malik Magdon-Ismail Puzzle: outside the data − →

Outside the Data Set - A Puzzle Dogs ( f = − 1) Trees ( f = +1) Tree or Dog? ( f = ?) M The Perceptron : 25 /25 � A c L Creator: Malik Magdon-Ismail

Learning From Data Lecture 2 The Perceptron The Learning Setup A - PowerPoint PPT Presentation

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1. What is Learning? 2. Can We do it?

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

Implementing a Multilayer Perceptron from Scratch Implementing a Multilayer Perceptron from

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Announcements Homework 1: Due today Office hours Come to office hours before your presentation!

Statistical methods in bioinformatics Brief introduction, statistical models, dimension

L ECTURE 4: L INEAR CLASSIFIERS Prof. Julia Hockenmaier juliahmr@illinois.edu Announcements

Machine learning theory Regression Hamid Beigy Sharif university of technology June 1, 2020

Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v

Machine Learning Regression Where we are Inputs Prob- Density ability Estimator Inputs

Lecture 6 Mojtaba Soltanalian- UIC msol@uic.edu http://msol.people.uic.edu Based on ECE 531

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of

Learning From Data Lecture 2 The Perceptron The Learning Setup A - PowerPoint PPT Presentation

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1. What is Learning? 2. Can We do it?

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net &gt; 0, else 0) l

CS 472 Homework CS 472 - Homework 1 Perceptron Homework Assume a 3 input perceptron plus bias

Implementing a Multilayer Perceptron from Scratch Implementing a Multilayer Perceptron from

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Announcements Homework 1: Due today Office hours Come to office hours before your presentation!

Statistical methods in bioinformatics Brief introduction, statistical models, dimension

L ECTURE 4: L INEAR CLASSIFIERS Prof. Julia Hockenmaier juliahmr@illinois.edu Announcements

Machine learning theory Regression Hamid Beigy Sharif university of technology June 1, 2020

Going be y ond linear regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita Ciro v

Machine Learning Regression Where we are Inputs Prob- Density ability Estimator Inputs

Lecture 6 Mojtaba Soltanalian- UIC msol@uic.edu http://msol.people.uic.edu Based on ECE 531

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l