The Perceptron CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: - - PowerPoint PPT Presentation

the perceptron
SMART_READER_LITE
LIVE PREVIEW

The Perceptron CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: - - PowerPoint PPT Presentation

The Perceptron CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal Daume III This week Project 1 posted Form teams! Due Wed March 2 nd by 2:59pm A new model/algorithm the perceptron and its


slide-1
SLIDE 1

The Perceptron

CMSC 422 MARINE CARPUAT

marine@cs.umd.edu

Credit: figures by Piyush Rai and Hal Daume III

slide-2
SLIDE 2

This week

  • Project 1 posted

– Form teams! – Due Wed March 2nd by 2:59pm

  • A new model/algorithm

– the perceptron – and its variants: voted, averaged

  • Fundamental Machine Learning Concepts

– Online vs. batch learning – Error-driven learning

slide-3
SLIDE 3

Geometry concept: Hy Hyperplane erplane

  • Separates a D-dimensional

space into two half-spaces

  • Defined by an outward

pointing normal vector 𝑥 ∈ ℝ𝐸

– 𝑥 is orthogonal to any vector lying on the hyperplane

  • Hyperplane passes through

the origin, unless we also define a bias term b

slide-4
SLIDE 4

Binary classification via hyperplanes

  • Let’s assume that the

decision boundary is a hyperplane

  • Then, training consists in

finding a hyperplane 𝑥 that separates positive from negative examples

slide-5
SLIDE 5

Binary classification via hyperplanes

  • At test time, we check on

what side of the hyperplane examples fall

𝑧 = 𝑡𝑗𝑕𝑜(𝑥𝑈𝑦 + 𝑐)

slide-6
SLIDE 6

Function Approximation with Perceptron

Problem setting

  • Set of possible instances 𝑌

– Each instance 𝑦 ∈ 𝑌 is a feature vector 𝑦 = [𝑦1, … , 𝑦𝐸]

  • Unknown target function 𝑔: 𝑌 → 𝑍

– 𝑍 is binary valued {-1; +1}

  • Set of function hypotheses 𝐼 = ℎ ℎ: 𝑌 → 𝑍}

– Each hypothesis ℎ is a hyperplane in D-dimensional space

Input

  • Training examples { 𝑦 1 , 𝑧 1 , … 𝑦 𝑂 , 𝑧 𝑂

} of unknown target function 𝑔 Output

  • Hypothesis ℎ ∈ 𝐼 that best approximates target function 𝑔
slide-7
SLIDE 7

Perception: Prediction Algorithm

slide-8
SLIDE 8

Aside: biological inspiration

Analogy: the perceptron as a neuron

slide-9
SLIDE 9

Perceptron Training Algorithm

slide-10
SLIDE 10

Properties of the Perceptron training algorithm

  • Online

– We look at one example at a time, and update the model as soon as we make an error – As opposed to batch algorithms that update parameters after seeing the entire training set

  • Error-driven

– We only update parameters/model if we make an error

slide-11
SLIDE 11

Perceptron update: geometric interpretation

slide-12
SLIDE 12

Practical considerations

  • The order of training examples matters!

– Random is better

  • Early stopping

– Good strategy to avoid overfitting

  • Simple modifications dramatically improve

performance

– voting or averaging

slide-13
SLIDE 13

Predicting with

  • The voted perceptron
  • The averaged perceptron
  • Require keeping track of “survival time” of

weight vectors

slide-14
SLIDE 14

How would you modify this algorithm for voted perceptron?

slide-15
SLIDE 15

How would you modify this algorithm for averaged perceptron?

slide-16
SLIDE 16

Averaged perceptron decision rule

can be rewritten as

slide-17
SLIDE 17

Averaged Perceptron Training

slide-18
SLIDE 18

Can the perceptron always find a hyperplane to separate positive from negative examples?

slide-19
SLIDE 19

This week

  • Project 1 posted

– Form teams! – Due Wed March 2nd by 2:59pm

  • A new model/algorithm

– the perceptron – and its variants: voted, averaged

  • Fundamental Machine Learning Concepts

– Online vs. batch learning – Error-driven learning