Learning From Data Lecture 18 Radial Basis Functions - - PowerPoint PPT Presentation

learning from data lecture 18 radial basis functions
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 18 Radial Basis Functions - - PowerPoint PPT Presentation

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k -RBF-Network M. Magdon-Ismail CSCI 4100/6100 recap: Data Condensation and Nearest Neighbor Search Training Set Consistent S 2 Branch and bound for


slide-1
SLIDE 1

Learning From Data Lecture 18 Radial Basis Functions

Non-Parametric RBF Parametric RBF k-RBF-Network

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: Data Condensation and Nearest Neighbor Search

Training Set Consistent

− − − →

S1 S2 x Branch and bound for finding nearest neighbors. Lloyd’s algorithm for finding a good clustering.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 2 /31

RBF vs. k-NN − →

slide-3
SLIDE 3

Radial Basis Functions (RBF)

k-Nearest Neighbor: Only considers k-nearest neighbors.

each neighbor has equal weight

What about using all data to compute g(x)? RBF: Use all data.

data further away from x have less weight.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 3 /31

Weighting data points − →

slide-4
SLIDE 4

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

  • decreasing function of |

| x − xn | |

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

  • 1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 4 /31

Weighting depends on distance − →

slide-5
SLIDE 5

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

  • weighting depends on the distance |

| x − xn | |

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

  • 1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 5 /31

Relative to scale r − →

slide-6
SLIDE 6

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

  • . . . relative to a scale parameter r

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

  • 1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 6 /31

Determined by φ − →

slide-7
SLIDE 7

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

  • kernel φ determines how the weighting decreases with distance

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

  • 1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 7 /31

Example Kernels φ − →

slide-8
SLIDE 8

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

  • kernel φ determines how the weighting decreases with distance

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

  • 1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 8 /31

Nonparametric RBF final hypothesis − →

slide-9
SLIDE 9

Nonparametric RBF – Regression

αn(x) = φ

|

| x − xn | | r

  • x

(xn, yn) αn y

g(x) =

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn

Weighted average of target values ր

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 9 /31

Nonparametric RBF – classsification − →

slide-10
SLIDE 10

Nonparametric RBF – Classification

αn(x) = φ

|

| x − xn | | r

  • x

(xn, yn) αn y

g(x) = sign

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn
  • Weighted average of target values

ր

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 10 /31

Nonparametric RBF – logistic regression − →

slide-11
SLIDE 11

Nonparametric RBF – Logistic Regression

αn(x) = φ

|

| x − xn | | r

  • x

(xn, yn) αn y

g(x) =

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn = +1

Weighted average of target values ր

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 11 /31

Choosing the scale r − →

slide-12
SLIDE 12

Choice of Scale r

Nearest Neighbor

k = 1 k = 3 k = 11

Choosing k: k = 3 k = √ N CV

Nonparametric RBF

r = 0.01 r = 0.05 r = 0.5

Choosing r: r ∼ 1

2d

√ N CV

  • verfitting

underfitting

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 12 /31

Highlights of Nonparametric RBF − →

slide-13
SLIDE 13

Highlights of Nonparametric RBF

  • 1. Simple (‘smooth’ version of k-NN rule).
  • 2. No training.
  • 3. Near optimal Eout.
  • 4. Easy to justify classification to customer.
  • 5. Can do classification, multi-class, regression, logistic regression.
  • 6. Computationally demanding. }

A good! method

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 13 /31

Bumps on Data Points − →

slide-14
SLIDE 14

Scaled Bumps on Each Data Point

g(x) =

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn

Weighted average of yn

x (xn, yn) αn y

g(x) =

N

  • n=1
  • yn

N

m=1 αm(x)

  • · φ

|

| x − xn | | r

  • =

N

  • n=1

wn(x)φ

|

| x − xn | | r

  • Sum of bumps at xn scaled by wn(x)

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 14 /31

Rewrite as weighted bumps − →

slide-15
SLIDE 15

Scaled Bumps on Each Data Point

g(x) =

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn

Weighted average of yn

x (xn, yn) αn y

g(x) =

N

  • n=1
  • yn

N

m=1 αm(x)

  • · φ

|

| x − xn | | r

  • =

N

  • n=1

wn(x)φ

|

| x − xn | | r

  • Sum of bumps at xn scaled by wn(x)

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 15 /31

Weighted bumps, wn(x) − →

slide-16
SLIDE 16

Scaled Bumps on Each Data Point

g(x) =

N

  • n=1
  • αn(x)

N

m=1 αm(x)

  • · yn

Weighted average of yn

x (xn, yn) αn y

g(x) =

N

  • n=1
  • yn

N

m=1 αm(x)

  • · φ

|

| x − xn | | r

  • =

N

  • n=1

wn(x) · φ

|

| x − xn | | r

  • Sum of bumps at xn scaled by wn(x)

x (xn, yn) wn y

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 16 /31

Nonparametric RBF: 3 point example − →

slide-17
SLIDE 17

Nonparametric RBF: wn(x)

Nonparametric RBF g(x) =

N

  • n=1

wn(x) · φ | | x − xn | | r

  • Only need to specify r.

r = 0.1 x y r = 0.3 x y

Parametric RBF h(x) =

N

  • n=1

wn · φ | | x − xn | | r

  • Fix r; need to determine the parameters wn.

— fit the data. — overfit the data?

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 17 /31

Parametric RBF − →

slide-18
SLIDE 18

Parametric RBF, wn – A Linear Model

Nonparametric RBF g(x) =

N

  • n=1

wn(x) · φ | | x − xn | | r

  • Only need to specify r.

r = 0.1 x y r = 0.3 x y

Parametric RBF h(x) =

N

  • n=1

wn · φ | | x − xn | | r

  • Fix r; need to determine the parameters wn.

— fit the data. — overfit the data?

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 18 /31

Parametric RBF 3 point example − →

slide-19
SLIDE 19

Parametric RBF – A Linear Model

Nonparametric RBF g(x) =

N

  • n=1

wn(x) · φ | | x − xn | | r

  • Only need to specify r.

r = 0.1 x y r = 0.3 x y

Parametric RBF h(x) =

N

  • n=1

wn · φ | | x − xn | | r

  • Fix r; need to determine the parameters wn.

— fit the data. — overfit the data?

r = 0.1 x y r = 0.3 x y

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 19 /31

RBF-Nonlinear Transform − →

slide-20
SLIDE 20

RBF-Nonlinear Transform Depends on Data

h(x) =

N

  • n=1

wn · φ

|

| x − xn | | r

  • = wtz

z = Φ(x) =

   

φ1(x) φ2(x) . . . φN(x)

    , φn(x)=φ(

| | x−xn | | r

). Z =

   

— zt

1 —

— zt

2 —

. . . — zt

N —

    =    

— Φ(x1)t — — Φ(x2)t — . . . — Φ(xN)t —

   

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 20 /31

Solving for w − →

slide-21
SLIDE 21

RBF-Nonlinear Transform Depends on Data

h(x) =

N

  • n=1

wn · φ

|

| x − xn | | r

  • = wtz

z = Φ(x) =

   

φ1(x) φ2(x) . . . φN(x)

    , φn(x)=φ(

| | x−xn | | r

). Z =

   

— zt

1 —

— zt

2 —

. . . — zt

N —

    =    

— Φ(x1)t — — Φ(x2)t — . . . — Φ(xN)t —

   

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

r = 0.1 x y r = 0.3 x y

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 21 /31

Reducing N → k (nonparametric) − →

slide-22
SLIDE 22

Reducing the Number of Bumps: Nonparametric

g(x) =

N

  • n=1

wn(x) · φ

|

| x − xn | | r

− → h(x) =

N

  • n=1

wn · φ

|

| x − xn | | r

− → h(x) = w0 +

k

  • j=1

wj · φ

|

| x − µj | | r

  • = wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

  • |

| x−µj | | r

  • .

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 22 /31

Parametric, N centers − →

slide-23
SLIDE 23

Reducing the Number of Bumps: Parametric

g(x) =

N

  • n=1

wn(x) · φ

|

| x − xn | | r

− → h(x) =

N

  • n=1

wn · φ

|

| x − xn | | r

− → h(x) = w0 +

k

  • j=1

wj · φ

|

| x − µj | | r

  • = wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

  • |

| x−µj | | r

  • .

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 23 /31

k-RBF-Network − →

slide-24
SLIDE 24

Reducing the Number of Bumps: k-RBF-Network

g(x) =

N

  • n=1

wn(x) · φ

|

| x − xn | | r

− → h(x) =

N

  • n=1

wn · φ

|

| x − xn | | r

− → h(x) = w0 +

k

  • j=1

wj · φ

|

| x − µj | | r

  • = wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

  • |

| x−µj | | r

  • .

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 24 /31

Graphical representation − →

slide-25
SLIDE 25

Reducing the Number of Bumps: k-RBF-Network

g(x) =

N

  • n=1

wn(x) · φ

|

| x − xn | | r

− → h(x) =

N

  • n=1

wn · φ

|

| x − xn | | r

− → h(x) = w0 +

k

  • j=1

wj · φ

|

| x − µj | | r

  • = wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

  • |

| x−µj | | r

  • .

| | x−µk | | r

x wj + · · · · · · φ φ φ w1 h(x) wk w0

| | x−µ1 | | r | | x−µj | | r

1

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 25 /31

Fitting the data − →

slide-26
SLIDE 26

Fitting the Data

Before: bumps were centered on xn — no choice Now: we may choose the bump centers µj

Choose them to ‘cover’ the data As the centers of k ‘clusters’

Given the bump centers, we have a linear model to determine the wj

That’s ‘easy’, we know how to do that.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 26 /31

The algorithm − →

slide-27
SLIDE 27

Fitting the Data

Fitting the RBF-network to the data (given k, r):

1: Use the inputs X to determine k centers µ1, . . . , µk. 2: Compute the N × (k + 1) feature matrix Z

Z =      — zt

1

— — zt

2

— . . . — zt

N

—      =      — Φ(x1)t — — Φ(x2)t — . . . — Φ(xN)t —      , where Φ(x) =      1 φ1(x) . . . φk(x)      , φj(x) = φ

  • |

| x−µj | | r

  • Each row of Z is the RBF-feature corresponding to xn (with dummy bias coordinate 1).

3: Fit the linear model Zw to y to determine the weights w∗.

classification: PLA, pocket, linear programming,. . . . regression: pseudoinverse. logistic regression: gradient descent on cross entropy error.

Choose r using CV, or (a heuristic): r ∼ radius of data k1/d

(so your clusters ‘cover’ the data)

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 27 /31

Example, k = 4, 10 − →

slide-28
SLIDE 28

Our Example

k = 4, r = 1

k

k = 10, r = 1

k

x y x y

w = (ZtZ)−1Zty

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 28 /31

Regularization − →

slide-29
SLIDE 29

Use Regularization to Fight Overfitting

k = 10, r = 1

k

k = 10, r = 1

k, regularized

x y x y

w = (ZtZ + λI)−1Zty

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 29 /31

Summary of k-RBF-Network − →

slide-30
SLIDE 30

Reflecting on the k-RBF-Network

  • 1. We derived it as a ‘soft’ generalization of k-NN rule.

Can also be derived from regularization theory. Can also be derived from noisy interpolation theory.

  • 2. Can use nonparametric or parametric versions.
  • 3. Given centers, ‘easy’ to learn the weights using techniques from linear models.

A linear model with an adaptable nonlinear transform.

  • 4. We used uniform bumps – can have different shapes Σj.
  • 5. NEXT: How to better choose the centers: unsupervised learning.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 30 /31

A peek at unsupervised learning − →

slide-31
SLIDE 31

A Peek at Unsupervised Learning

21-NN rule, 10 Classes 10 Clustering of Data

1 2 3 4 6 7 8 9

Average Intensity Symmetry

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 31 /31