Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 - PowerPoint PPT Presentation

Machine Learning (CSE 446): Perceptron Sham M Kakade c � 2018 University of Washington cse446-staff@cs.washington.edu 1 / 14

Announcements ◮ HW due this week. See detailed instructions in the hw. ◮ One pdf file. ◮ Answers and figures grouped together for each problem (in order). ◮ Submit your code (only for problem 4 needed for HW1). ◮ Qz section this week. a little probability + more linear algebra ◮ Updated late policy: ◮ You get 2 late days for the entire quarter, which will be automatically deducted per late day (or part thereof). ◮ After these days are used up, 33% deducted per day. ◮ Today: the perceptron algo 1 / 14

Review 1 / 14

The General Recipe (for supervised learning) The cardinal rule of machine learning: Don’t touch your test data. If you follow that rule, this recipe will give you accurate information: 1. Split data into training, (maybe) development, and test sets. 2. For different hyperparameter settings: 2.1 Train on the training data using those hyperparameter values. 2.2 Evaluate loss on development data. 3. Choose the hyperparameter setting whose model achieved the lowest development data loss. Optionally, retrain on the training and development data together. 4. Now you have a hypothesis. Evaluate that model on test data. 2 / 14

Also... ◮ supervised learning algorithms we covered: ◮ Decision trees: uses few features, “selectively” ◮ Nearest neighbor: uses all features, “blindly” ◮ unsupervised learning algorithms we covered: ◮ K-means: a clustering algorithm we show it converges later in the class ◮ this algorithm does not use labels 3 / 14

Probability you should know ◮ expectations (and notation) ◮ mean, variance ◮ unbiased estimate ◮ joint distributions ◮ conditional distributions 3 / 14

Linear algebra you should know ◮ vectors ◮ inner products (and interpretation) ◮ Euclidean distance. Euclidean norm. ◮ soon: ◮ matrices and matrix multiplication ◮ “outer” products ◮ a covariance matrix ◮ how to write a vector x in an “orthogonal” basis. ◮ SVD/eigenvectors/eigenvalues... 3 / 14

Today 3 / 14

Is there a happy medium? Decision trees (that aren’t too deep): use relatively few features to classify. K -nearest neighbors: all features weighted equally. Today: use all features, but weight them. 4 / 14

Is there a happy medium? Decision trees (that aren’t too deep): use relatively few features to classify. K -nearest neighbors: all features weighted equally. Today: use all features, but weight them. For today’s lecture, assume that y ∈ {− 1 , +1 } instead of { 0 , 1 } , and that x ∈ R d . 4 / 14

Inspiration from Neurons Image from Wikimedia Commons. Input signals come in through dendrites, output signal passes out through the axon. 5 / 14

Neuron-Inspired Classifier weight input parameters x [1] w [1] × x [2] w [2] × “activation” output ∑ x [3] w [3] × ŷ ! fire, or not? … bias parameter x [ d ] w [ d ] × b 6 / 14

Neuron-Inspired Classifier input f x [1] w [1] × x [2] w [2] × output x [3] w [3] × ∑ ŷ ! … x [ d ] w [d] × b 6 / 14

A “parametric” Hypothesis f ( x ) = sign ( w · x + b ) d � remembering that: w · x = w [ j ] · x [ j ] j =1 Learning requires us to set the weights w and the bias b . 7 / 14

Geometrically... ◮ What does the decision boundary look like? 8 / 14

“Online learning” ◮ Let’s think of an online algorithm, where we try to update w as we examine each training point ( x i , y i ) , one at a time. ◮ How should we change w if we do not make an error on some point ( x, y ) ? ◮ How should we change w if we make an error on some point ( x, y ) ? 9 / 14

Perceptron Learning Algorithm Data : D = � ( x n , y n ) � N n =1 , number of epochs E Result : weights w and bias b initialize: w = 0 and b = 0 ; for e ∈ { 1 , . . . , E } do for n ∈ { 1 , . . . , N } , in random order do # predict y = sign ( w · x n + b ) ; ˆ if ˆ y � = y n then # update w ← w + y n · x n ; b ← b + y n ; end end end return w , b Algorithm 1: PerceptronTrain 10 / 14

Parameters and Hyperparameters This is the first supervised algorithm we’ve seen that has parameters that are numerical values ( w and b ). The perceptron learning algorithm’s sole hyperparameter is E , the number of epochs (passes over the training data). How should we tune E using the development data? 11 / 14

Linear Decision Boundary w · x + b = 0 activation = w · x + b 12 / 14

Interpretation of Weight Values What does it mean when . . . ◮ w 1 = 100 ? ◮ w 2 = − 1 ? ◮ w 3 = 0 ? What about feature scaling? 13 / 14

What can we say about convergence? ◮ Can you say what has to occur for the algorithm to converge? ◮ Can we understand when it will never converge? Stay tuned... 14 / 14

Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 - PowerPoint PPT Presentation

Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 14 Announcements HW due this week. See detailed instructions in the hw. One pdf file. Answers and figures

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Machine Learning (CSE 446): Introduction Sham M Kakade 2018 c University of Washington

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

CSE 446: Linear Algebra Review Sachin Mehta University of Washington, Seattle Email:

CSCI 446: Arficial Intelligence CSCI 446: Arficial Intelligence

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

Data Quality Assurance Or How to get good data , by Florian Netzer & Lars Wolf Image

CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan

Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Week 1: 6 weeks, Sep 13 - Oct 18 Instructor: Tamara Munzner participation, 10%

r s r rt

NAMED DATA NETWORKING (NDN) Named Data Networking NDN BRIEF HISTORY When the Networking was

Single-Cycle CPU Datapath Design "The Do-It-Yourself CPU Kit" CSE 141, S2'06 Jeff

Datapath Elements & Single Cycle Datapath Unit Chapter 11 Datapath Elements Introduction

Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 - PowerPoint PPT Presentation

Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 14 Announcements HW due this week. See detailed instructions in the hw. One pdf file. Answers and figures

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Machine Learning (CSE 446): Introduction Sham M Kakade 2018 c University of Washington

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning (CSE 446): Probabilistic Machine Learning MLE &amp; MAP Sham M Kakade 2018

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Introduction to Machine Learning Multilayer Perceptron Barnabs Pczos The Multilayer

CSE 446: Linear Algebra Review Sachin Mehta University of Washington, Seattle Email:

CSCI 446: Ar*ficial Intelligence CSCI 446: Ar*ficial Intelligence

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

Data Quality Assurance Or How to get good data , by Florian Netzer &amp; Lars Wolf Image

CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan

Big Data Management &amp; Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Week 1: 6 weeks, Sep 13 - Oct 18 Instructor: Tamara Munzner participation, 10%

r s r rt

NAMED DATA NETWORKING (NDN) Named Data Networking NDN BRIEF HISTORY When the Networking was

Single-Cycle CPU Datapath Design &quot;The Do-It-Yourself CPU Kit&quot; CSE 141, S2'06 Jeff

Datapath Elements &amp; Single Cycle Datapath Unit Chapter 11 Datapath Elements Introduction

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018

CSCI 446: Arficial Intelligence CSCI 446: Arficial Intelligence

Data Quality Assurance Or How to get good data , by Florian Netzer & Lars Wolf Image

Big Data Management & Analytics EXERCISE 8 TEXT PROCESSING, PCA 21st of December, 2015

Single-Cycle CPU Datapath Design "The Do-It-Yourself CPU Kit" CSE 141, S2'06 Jeff

Datapath Elements & Single Cycle Datapath Unit Chapter 11 Datapath Elements Introduction