[PPT] - Learning From Data Lecture 18 Radial Basis Functions PowerPoint Presentation

SLIDE 1

Learning From Data Lecture 18 Radial Basis Functions

Non-Parametric RBF Parametric RBF k-RBF-Network

M. Magdon-Ismail

CSCI 4100/6100

SLIDE 2

recap: Data Condensation and Nearest Neighbor Search

Training Set Consistent

− − − →

S1 S2 x Branch and bound for finding nearest neighbors. Lloyd’s algorithm for finding a good clustering.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 2 /31

RBF vs. k-NN − →

SLIDE 3

Radial Basis Functions (RBF)

k-Nearest Neighbor: Only considers k-nearest neighbors.

each neighbor has equal weight

What about using all data to compute g(x)? RBF: Use all data.

data further away from x have less weight.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 3 /31

Weighting data points − →

SLIDE 4

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

decreasing function of |

| x − xn | |

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 4 /31

Weighting depends on distance − →

SLIDE 5

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

weighting depends on the distance |

| x − xn | |

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 5 /31

Relative to scale r − →

SLIDE 6

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

. . . relative to a scale parameter r

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 6 /31

Determined by φ − →

SLIDE 7

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

kernel φ determines how the weighting decreases with distance

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 7 /31

Example Kernels φ − →

SLIDE 8

Weighting the Data Points: αn

Test point x. αn: the weight of xn in g(x). αn(x) = φ

|

| x − xn | | r

kernel φ determines how the weighting decreases with distance

Most popular kernel: Gaussian φ(z) = e− 1

2z2.

Window kernel, mimics k-NN, φ(z) =

1

z ≤ 1, z > 1,

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 8 /31

Nonparametric RBF final hypothesis − →

SLIDE 9

Nonparametric RBF – Regression

αn(x) = φ

|

| x − xn | | r

x

(xn, yn) αn y

g(x) =

N

n=1
αn(x)

N

m=1 αm(x)

· yn

Weighted average of target values ր

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 9 /31

Nonparametric RBF – classsification − →

SLIDE 10

Nonparametric RBF – Classification

αn(x) = φ

|

| x − xn | | r

x

(xn, yn) αn y

g(x) = sign

N

n=1
αn(x)

N

m=1 αm(x)

· yn
Weighted average of target values

ր

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 10 /31

Nonparametric RBF – logistic regression − →

SLIDE 11

Nonparametric RBF – Logistic Regression

αn(x) = φ

|

| x − xn | | r

x

(xn, yn) αn y

g(x) =

N

n=1
αn(x)

N

m=1 αm(x)

· yn = +1

Weighted average of target values ր

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 11 /31

Choosing the scale r − →

SLIDE 12

Choice of Scale r

Nearest Neighbor

k = 1 k = 3 k = 11

Choosing k: k = 3 k = √ N CV

Nonparametric RBF

r = 0.01 r = 0.05 r = 0.5

Choosing r: r ∼ 1

2d

√ N CV

verfitting

underfitting

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 12 /31

Highlights of Nonparametric RBF − →

SLIDE 13

Highlights of Nonparametric RBF

1. Simple (‘smooth’ version of k-NN rule).
2. No training.
3. Near optimal Eout.
4. Easy to justify classification to customer.
5. Can do classification, multi-class, regression, logistic regression.
6. Computationally demanding. }

A good! method

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 13 /31

Bumps on Data Points − →

SLIDE 14

Scaled Bumps on Each Data Point

g(x) =

N

n=1
αn(x)

N

m=1 αm(x)

· yn

Weighted average of yn

x (xn, yn) αn y

g(x) =

N

n=1
yn

N

m=1 αm(x)

· φ

|

| x − xn | | r

=

N

n=1

wn(x)φ

|

| x − xn | | r

Sum of bumps at xn scaled by wn(x)

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 14 /31

Rewrite as weighted bumps − →

SLIDE 15

Scaled Bumps on Each Data Point

g(x) =

N

n=1
αn(x)

N

m=1 αm(x)

· yn

Weighted average of yn

x (xn, yn) αn y

g(x) =

N

n=1
yn

N

m=1 αm(x)

· φ

|

| x − xn | | r

=

N

n=1

wn(x)φ

|

| x − xn | | r

Sum of bumps at xn scaled by wn(x)

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 15 /31

Weighted bumps, wn(x) − →

SLIDE 16

Scaled Bumps on Each Data Point

g(x) =

N

n=1
αn(x)

N

m=1 αm(x)

· yn

Weighted average of yn

x (xn, yn) αn y

g(x) =

N

n=1
yn

N

m=1 αm(x)

· φ

|

| x − xn | | r

=

N

n=1

wn(x) · φ

|

| x − xn | | r

Sum of bumps at xn scaled by wn(x)

x (xn, yn) wn y

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 16 /31

Nonparametric RBF: 3 point example − →

SLIDE 17

Nonparametric RBF: wn(x)

Nonparametric RBF g(x) =

N

n=1

wn(x) · φ | | x − xn | | r

Only need to specify r.

r = 0.1 x y r = 0.3 x y

Parametric RBF h(x) =

N

n=1

wn · φ | | x − xn | | r

Fix r; need to determine the parameters wn.

— fit the data. — overfit the data?

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 17 /31

Parametric RBF − →

SLIDE 18

Parametric RBF, wn – A Linear Model

Nonparametric RBF g(x) =

N

n=1

wn(x) · φ | | x − xn | | r

Only need to specify r.

r = 0.1 x y r = 0.3 x y

Parametric RBF h(x) =

N

n=1

wn · φ | | x − xn | | r

Fix r; need to determine the parameters wn.

— fit the data. — overfit the data?

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 18 /31

Parametric RBF 3 point example − →

SLIDE 19

Parametric RBF – A Linear Model

Nonparametric RBF g(x) =

N

n=1

wn(x) · φ | | x − xn | | r

Only need to specify r.

r = 0.1 x y r = 0.3 x y

Parametric RBF h(x) =

N

n=1

wn · φ | | x − xn | | r

Fix r; need to determine the parameters wn.

— fit the data. — overfit the data?

r = 0.1 x y r = 0.3 x y

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 19 /31

RBF-Nonlinear Transform − →

SLIDE 20

RBF-Nonlinear Transform Depends on Data

h(x) =

N

n=1

wn · φ

|

| x − xn | | r

= wtz

z = Φ(x) =

   

φ1(x) φ2(x) . . . φN(x)

    , φn(x)=φ(

| | x−xn | | r

). Z =

   

— zt

1 —

— zt

2 —

. . . — zt

N —

    =    

— Φ(x1)t — — Φ(x2)t — . . . — Φ(xN)t —

   

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 20 /31

Solving for w − →

SLIDE 21

RBF-Nonlinear Transform Depends on Data

h(x) =

N

n=1

wn · φ

|

| x − xn | | r

= wtz

z = Φ(x) =

   

φ1(x) φ2(x) . . . φN(x)

    , φn(x)=φ(

| | x−xn | | r

). Z =

   

— zt

1 —

— zt

2 —

. . . — zt

N —

    =    

— Φ(x1)t — — Φ(x2)t — . . . — Φ(xN)t —

   

Fit the data (h(xn) = yn):

w = Z†y = (ZtZ)−1Zty

r = 0.1 x y r = 0.3 x y

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 21 /31

Reducing N → k (nonparametric) − →

SLIDE 22

Reducing the Number of Bumps: Nonparametric

g(x) =

N

n=1

wn(x) · φ

|

| x − xn | | r

−

− → h(x) =

N

n=1

wn · φ

|

| x − xn | | r

−

− → h(x) = w0 +

k

j=1

wj · φ

|

| x − µj | | r

= wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

|

| x−µj | | r

.

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 22 /31

Parametric, N centers − →

SLIDE 23

Reducing the Number of Bumps: Parametric

g(x) =

N

n=1

wn(x) · φ

|

| x − xn | | r

−

− → h(x) =

N

n=1

wn · φ

|

| x − xn | | r

−

− → h(x) = w0 +

k

j=1

wj · φ

|

| x − µj | | r

= wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

|

| x−µj | | r

.

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 23 /31

k-RBF-Network − →

SLIDE 24

Reducing the Number of Bumps: k-RBF-Network

g(x) =

N

n=1

wn(x) · φ

|

| x − xn | | r

−

− → h(x) =

N

n=1

wn · φ

|

| x − xn | | r

−

− → h(x) = w0 +

k

j=1

wj · φ

|

| x − µj | | r

= wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

|

| x−µj | | r

.

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 24 /31

Graphical representation − →

SLIDE 25

Reducing the Number of Bumps: k-RBF-Network

g(x) =

N

n=1

wn(x) · φ

|

| x − xn | | r

−

− → h(x) =

N

n=1

wn · φ

|

| x − xn | | r

−

− → h(x) = w0 +

k

j=1

wj · φ

|

| x − µj | | r

= wtΦ(x)

Φ(x)t = [1, Φ1(x), . . ., Φk(x)], where Φj(x) = Φ

|

| x−µj | | r

.

| | x−µk | | r

x wj + · · · · · · φ φ φ w1 h(x) wk w0

| | x−µ1 | | r | | x−µj | | r

1

ւ nonlinear in µj

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 25 /31

Fitting the data − →

SLIDE 26

Fitting the Data

Before: bumps were centered on xn — no choice Now: we may choose the bump centers µj

Choose them to ‘cover’ the data As the centers of k ‘clusters’

Given the bump centers, we have a linear model to determine the wj

That’s ‘easy’, we know how to do that.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 26 /31

The algorithm − →

SLIDE 27

Fitting the Data

Fitting the RBF-network to the data (given k, r):

1: Use the inputs X to determine k centers µ1, . . . , µk. 2: Compute the N × (k + 1) feature matrix Z

Z =      — zt

1

— — zt

2

— . . . — zt

N

—      =      — Φ(x1)t — — Φ(x2)t — . . . — Φ(xN)t —      , where Φ(x) =      1 φ1(x) . . . φk(x)      , φj(x) = φ

|

| x−µj | | r

Each row of Z is the RBF-feature corresponding to xn (with dummy bias coordinate 1).

3: Fit the linear model Zw to y to determine the weights w∗.

classification: PLA, pocket, linear programming,. . . . regression: pseudoinverse. logistic regression: gradient descent on cross entropy error.

Choose r using CV, or (a heuristic): r ∼ radius of data k1/d

(so your clusters ‘cover’ the data)

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 27 /31

Example, k = 4, 10 − →

SLIDE 28

Our Example

k = 4, r = 1

k

k = 10, r = 1

k

x y x y

w = (ZtZ)−1Zty

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 28 /31

Regularization − →

SLIDE 29

Use Regularization to Fight Overfitting

k = 10, r = 1

k

k = 10, r = 1

k, regularized

x y x y

w = (ZtZ + λI)−1Zty

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 29 /31

Summary of k-RBF-Network − →

SLIDE 30

Reflecting on the k-RBF-Network

1. We derived it as a ‘soft’ generalization of k-NN rule.

Can also be derived from regularization theory. Can also be derived from noisy interpolation theory.

2. Can use nonparametric or parametric versions.
3. Given centers, ‘easy’ to learn the weights using techniques from linear models.

A linear model with an adaptable nonlinear transform.

4. We used uniform bumps – can have different shapes Σj.
5. NEXT: How to better choose the centers: unsupervised learning.

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 30 /31

A peek at unsupervised learning − →

SLIDE 31

A Peek at Unsupervised Learning

21-NN rule, 10 Classes 10 Clustering of Data

1 2 3 4 6 7 8 9

Average Intensity Symmetry

c A M L Creator: Malik Magdon-Ismail

Radial Basis Functions: 31 /31