Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com - - PowerPoint PPT Presentation

perceptrons
SMART_READER_LITE
LIVE PREVIEW

Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com - - PowerPoint PPT Presentation

Perceptrons Jonathan Mugan jonathanwilliammugan@gmail.com www.jonathanmugan.com @jmugan April 10, 2014 (Slides taken from Dan Klein) Classification: Feature Vectors Hello, # free : 2 SPAM YOUR_NAME : 0 Do you want free printr


slide-1
SLIDE 1

Perceptrons

Jonathan Mugan

jonathanwilliammugan@gmail.com www.jonathanmugan.com @jmugan April 10, 2014 (Slides taken from Dan Klein)

slide-2
SLIDE 2

Classification: Feature Vectors

Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...

SPAM

  • r

+

PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 ...

“2”

This slide deck courtesy of Dan Klein at UC Berkeley

slide-3
SLIDE 3

Some (Simplified) Biology

  • Very loose inspiration: human neurons

2

slide-4
SLIDE 4

Linear Classifiers

  • Inputs are feature values
  • Each feature has a weight
  • Sum is the activation
  • If the activation is:
  • Positive, output +1
  • Negative, output -1

Σ

f1 f2 f3 w1 w2 w3

>0?

3

slide-5
SLIDE 5

Example: Spam

  • Imagine 4 features (spam is “positive” class):
  • free (number of occurrences of “free”)
  • money (occurrences of “money”)
  • BIAS (intercept, always has value 1)

BIAS : -3 free : 4 money : 2 ... BIAS : 1 free : 1 money : 1 ...

“free money”

slide-6
SLIDE 6

Classification: Weights

  • Binary case: compare features to a weight vector
  • Learning: figure out the weight vector from examples

# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ... # free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ... # free : 0 YOUR_NAME : 1 MISSPELLED : 1 FROM_FRIEND : 1 ...

Dot product positive means the positive class

slide-7
SLIDE 7

Binary Decision Rule

  • In the space of feature vectors
  • Examples are points
  • Any weight vector is a hyperplane
  • One side corresponds to Y=+1
  • Other corresponds to Y=-1

BIAS : -3 free : 4 money : 2 ... 1 1 2 free money +1 = SPAM

  • 1 = HAM
slide-8
SLIDE 8

Mistake-Driven Classification

  • For Naïve Bayes:
  • Parameters from data statistics
  • Parameters: causal interpretation
  • Training: one pass through the data
  • For the perceptron:
  • Parameters from reactions to mistakes
  • Prameters: discriminative interpretation
  • Training: go through the data until held-
  • ut accuracy maxes out

Training Data Held-Out Data Test Data

7

slide-9
SLIDE 9

Learning: Binary Perceptron

  • Start with weights = 0
  • For each training instance:
  • Classify with current weights
  • If correct (i.e., y=y*), no change!
  • If wrong: adjust the weight vector

by adding or subtracting the feature vector. Subtract if y* is -1.

8

slide-10
SLIDE 10

Multiclass Decision Rule

  • If we have more than

two classes:

  • Have a weight vector for

each class:

  • Calculate an activation for

each class

  • Highest activation wins

9

slide-11
SLIDE 11

Multiclass Decision Rule

  • If we have multiple classes:
  • A weight vector for each class:
  • Score (activation) of a class y:
  • Prediction highest score wins

Binary = multiclass where the negative class has weight zero

slide-12
SLIDE 12

Example

BIAS : -2 win : 4 game : 4 vote : 0 the : 0 ... BIAS : 1 win : 2 game : 0 vote : 4 the : 0 ... BIAS : 2 win : 0 game : 2 vote : 0 the : 0 ...

“win the vote”

BIAS : 1 win : 1 game : 0 vote : 1 the : 1 ...

slide-13
SLIDE 13

Learning: Multiclass Perceptron

  • Start with all weights = 0
  • Pick up training examples one by one
  • Predict with current weights
  • If correct, no change!
  • If wrong: lower score of wrong

answer, raise score of right answer

12

slide-14
SLIDE 14

Example: Multiclass Perceptron

BIAS : 1 win : 0 game : 0 vote : 0 the : 0 ... BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ... BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

“win the vote” “win the election” “win the game”

slide-15
SLIDE 15

Examples: Perceptron

  • Separable Case

14

slide-16
SLIDE 16

Examples: Perceptron

  • Separable Case

15

slide-17
SLIDE 17

Properties of Perceptrons

  • Separability: some parameters get

the training set perfectly correct

  • Convergence: if the training is

separable, perceptron will eventually converge (binary case)

  • Mistake Bound: the maximum

number of mistakes (binary case) related to the margin or degree of separability Separable Non-Separable

16

slide-18
SLIDE 18

Examples: Perceptron

  • Non-Separable Case

17

slide-19
SLIDE 19

Examples: Perceptron

  • Non-Separable Case

18

slide-20
SLIDE 20

Problems with the Perceptron

  • Noise: if the data isn’t separable,

weights might thrash

  • Averaging weight vectors over time

can help (averaged perceptron)

  • Mediocre generalization: finds a

“barely” separating solution

  • Overtraining: test / held-out

accuracy usually rises, then falls

  • Overtraining is a kind of overfitting