Learning From Data Lecture 2 The Perceptron The Learning Setup A - - PowerPoint PPT Presentation

learning from data lecture 2 the perceptron
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 2 The Perceptron The Learning Setup A - - PowerPoint PPT Presentation

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1. What is Learning? 2. Can We do it?


slide-1
SLIDE 1

Learning From Data Lecture 2 The Perceptron

The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: The Plan

  • 1. What is Learning?
  • 2. Can We do it?
  • 3. How to do it?
  • 4. How to do it well?
  • 5. General principles?
  • 6. Advanced techniques.
  • 7. Other Learning Paradigms.

concepts theory practice

  • ur language will be mathematics . . .

. . . our sword will be computer algorithms

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 2 /25

Recap: key players − →

slide-3
SLIDE 3

recap: The Key Players

  • Salary, debt, years in residence, . . .

input x ∈ Rd = X.

  • Approve credit or not
  • utput y ∈ {−1, +1} = Y.
  • True relationship between x and y

target function f : X → Y.

(The target f is unknown.)

  • Data on customers

data set D = (x1, y1), . . . , (xN, yN).

(yn = f(xn).)

X Y and D are given by the learning problem; The target f is fixed but unknown.

We learn the function f from the data D.

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 3 /25

Recap: learning setup − →

slide-4
SLIDE 4

recap: Summary of the Learning Setup

(ideal credit approval formula) (historical records of credit customers) (set of candidate formulas) (learned credit approval formula) UNKNOWN TARGET FUNCTION f : X → Y TRAINING EXAMPLES (x1, y1), (x2, y2), . . . , (xN, yN) HYPOTHESIS SET H FINAL HYPOTHESIS g ≈ f LEARNING ALGORITHM A

yn = f(xn)

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 4 /25

Simple learning model − →

slide-5
SLIDE 5

A Simple Learning Model

  • Input vector x = [x1, . . . , xd]t.
  • Give importance weights to the different inputs and compute a “Credit Score”

“Credit Score” =

d

  • i=1

wixi.

  • Approve credit if the “Credit Score” is acceptable.

Approve credit if

d

  • i=1

wixi > threshold,

(“Credit Score” is good)

Deny credit if

d

  • i=1

wixi < threshold.

(“Credit Score” is bad)

  • How to choose the importance weights wi

input xi is important = ⇒ large weight |wi| input xi beneficial for credit = ⇒ positive weight wi > 0 input xi detrimental for credit = ⇒ negative weight wi < 0

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 5 /25

Rewriting the model − →

slide-6
SLIDE 6

A Simple Learning Model

Approve credit if

d

  • i=1

wixi > threshold, Deny credit if

d

  • i=1

wixi < threshold. can be written formally as h(x) = sign

d

  • i=1

wixi

  • + w0
  • The “bias weight” w0 corresponds to the threshold. (How?)

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 6 /25

Perceptron − →

slide-7
SLIDE 7

The Perceptron Hypothesis Set

We have defined a Hyopthesis set H H = {h(x) = sign(wtx)} ← uncountably infinite H w =

   

w0 w1 . . . wd

    ∈ Rd+1,

x =

   

1 x1 . . . xd

    ∈ {1} × Rd.

This hypothesis set is called the perceptron or linear separator

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 7 /25

Geometry of perceptron − →

slide-8
SLIDE 8

Geometry of The Perceptron

h(x) = sign(wtx)

(Problem 1.2 in LFD)

Age Income Age Income

Which one should we pick?

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 8 /25

Use the data − →

slide-9
SLIDE 9

Use the Data to Pick a Line

Age Income Age Income

A perceptron fits the data by using a line to separate the +1 from −1 data. Fitting the data: How to find a hyperplane that separates the data?

(“It’s obvious - just look at the data and draw the line,” is not a valid solution.)

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 9 /25

How to learn g − →

slide-10
SLIDE 10

How to Learn a Final Hypothesis g from H

We want to select g ∈ H so that g ≈ f. We certainly want g ≈ f on the data set D. Ideally, g(xn) = yn. How do we find such a g in the infinite hypothesis set H, if it exists? Idea! Start with some weight vector and try to improve it.

Age Income

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 10 /25

PLA − →

slide-11
SLIDE 11

The Perceptron Learning Algorithm (PLA)

A simple iterative method.

1: w(1) = 0 2: for iteration t = 1, 2, 3, . . . 3: the weight vector is w(t). 4: From (x1, y1), . . . , (xN, yN) pick any misclassified example. 5: Call the misclassified example (x∗, y∗),

sign (w(t) • x∗) = y∗.

6: Update the weight:

w(t + 1) = w(t) + y∗x∗.

7: t ← t + 1

w(t + 1) w(t) w(t) w(t + 1) x∗ y∗ = −1 x∗ y∗x∗ y∗ = +1 y∗x∗

PLA implements our idea: start at some weights and try to improve. “incremental learning”on a single example at a time

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 11 /25

PLA convergence − →

slide-12
SLIDE 12

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

What if the data cannot be fit by a perceptron?

Age Income

iteration 1

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 12 /25

Start − →

slide-13
SLIDE 13

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

What if the data cannot be fit by a perceptron?

Age Income

iteration 1

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 13 /25

Iteration 1 − →

slide-14
SLIDE 14

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 1

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 14 /25

Iteratrion 2 − →

slide-15
SLIDE 15

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 2

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 15 /25

Iteration 3 − →

slide-16
SLIDE 16

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 3

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 16 /25

Iteration 4 − →

slide-17
SLIDE 17

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 4

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 17 /25

Iteration 5 − →

slide-18
SLIDE 18

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 5

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 18 /25

Iteration 6 − →

slide-19
SLIDE 19

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 6

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 19 /25

Non-separable data? − →

slide-20
SLIDE 20

Does PLA Work?

  • Theorem. If the data can be fit by a linear separator, then after some finite number
  • f steps, PLA will find one.

After how long? What if the data cannot be fit by a perceptron?

Age Income

iteration 1

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 20 /25

We can fit! − →

slide-21
SLIDE 21

We can Fit the Data

  • We can find an h that works from infinitely many (for the perceptron).

(So computationally, things seem good.)

  • Ultimately, remember that we want to predict.

We don’t care about the data, we care about “outside the data”. Can a limited data set reveal enough information to pin down an entire target function, so that we can predict outside the data?

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 21 /25

Other views of learning − →

slide-22
SLIDE 22

Other Views of Learning

  • Design: learning is from data, design is from specs and a model.
  • Statistics, Function Approximation.
  • Data Mining: find patterns in massive data (typically unsupervised).
  • Three Learning Paradigms

– Supervised: the data is (xn, f(xn)) – you are told the answer. – Reinforcement: you get feedback on potential answers you try: x → try something → get feedback. – Unsupervised: only given xn, learn to “organize” the data.

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 22 /25

Coins – supervised − →

slide-23
SLIDE 23

Supervised Learning - Classifying Coins

10 1 5 25 Size Mass 10 1 5 25 Size Mass

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 23 /25

Coins – unsupervised − →

slide-24
SLIDE 24

Unsupervised Learning - Categorizing Coins

Size Mass type 1 type 2 type 3 type 4 Size Mass

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 24 /25

Puzzle: outside the data − →

slide-25
SLIDE 25

Outside the Data Set - A Puzzle

Trees (f = +1) Dogs (f = −1) Tree or Dog? (f = ?)

c A M L Creator: Malik Magdon-Ismail

The Perceptron: 25 /25