[PPT] - Local Models Steven J Zeil Old Dominion Univ. Fall 2010 1 PowerPoint Presentation

SLIDE 1

Localizing Learning after Localizing

Local Models

Steven J Zeil

Old Dominion Univ.

Fall 2010

1

SLIDE 2

Localizing Learning after Localizing

Local Models

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

2

SLIDE 3

Localizing Learning after Localizing

Local Models

Piecewise approaches to regression. Divide input space into local regions and learn simple models

n each region.

Localization can be supervised or unsupervised Learning is then supervised Or can do both at

nce

3

SLIDE 4

Localizing Learning after Localizing

Localizing

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

4

SLIDE 5

Localizing Learning after Localizing

Competitive Learning

Competitive methods will assign x to one region and apply a function associated with that single region. Cooperative methods will apply a mixture of functions weighted according to which region x is most likely to belong.

5

SLIDE 6

Localizing Learning after Localizing

Competitive Learning Techniques

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

6

SLIDE 7

Localizing Learning after Localizing

Online k-Means

E

{

mi}k

i=1|X

=
t
i

bt

i ||

xt − mi|| bt

i =

1 if || xt − mi|| = minj || xt − mj||

w

batch k-means: mi =

t bt

i

xt

t bt

i

nline k-means:

∆mij = −η ∂E t ∂mij = ηbt

i (xt i − mij)

7

SLIDE 8

Localizing Learning after Localizing

Winner-take-all Network

Online k-means can be implemented via a variant of perceptrons Blue lines are inhibitory connections - seek to suppress other values Red are excitory - attempt to reinforce own output With appropriate weights, these suppress all but te maximum

8

SLIDE 9

Localizing Learning after Localizing

Adaptive Resonance Theory (ART)

Incrementally adds new cluster means ρ denotes vigilance If a new x lies outside the vigilance of all cluster centers, use that x as the center of a new cluster

9

SLIDE 10

Localizing Learning after Localizing

Self-Organizing Maps (SOM)

Units (cluster means) have a neighborhood

More often, 2D

Update both the closest mean mi to x but also the

nes in

mi’s neighborhood

Strength of the update falls off with steps through the neighborhood

∆ mj = ηe(j, i)( xt − mi) e(j, i) = 1 √ 2πσ exp

−(j − i)2

2σ2

10

SLIDE 11

Localizing Learning after Localizing

Learning Vector Quantization (LVQ)

Supervised technique Assume that existing cluster means are labeled with classes If x is closest to mi, ∆ mi = η( xt − mi) if label( xt) = label( mi) ∆ mi = −η( xt − mi)

w

11

SLIDE 12

Localizing Learning after Localizing

Radial-Basis Functions

A weighted distance from a cluster mean pt

h = exp

−||

xt − mh||2 2s2

h

sh is the “spread” around

mh

12

SLIDE 13

Localizing Learning after Localizing

Radial Functions and Perceptrons

pt

h = exp

−||

xt − mh||2 2s2

h

yt =

H

h=1

whpt

h + w0

Node that the ph are taking the usual place of the xi

13

SLIDE 14

Localizing Learning after Localizing

Using RBFs as a New Basis

14

SLIDE 15

Localizing Learning after Localizing

Obtaining RBFs

Unsupervised:

Use any prior technique to compute the means (e.g., k-means) Set the spread to cover the cluster

Find the xt belonging to cluster h but farthest from mh Set sh so that pt

h ≈ 0.5

Supervised: Because ph are differentiable, can combine with training of overall function

15

SLIDE 16

Localizing Learning after Localizing

Falling Between the Cracks

With RBFs it is possible for some x to fall outside region of influence of all clusters. May be useful to train an “overall” model and then train local exceptions yt =

H

h=1

whpt

h

exceptions

+ vT xt + v0

default rule

16

SLIDE 17

Localizing Learning after Localizing

Rule with Exceptions

17

SLIDE 18

Localizing Learning after Localizing

Normalized Basis Functions

Alternatively, normalize the basis functions so that their sum is 1.0 Do cooperative calculation

18

SLIDE 19

Localizing Learning after Localizing

Rule-Based Knowledge

Prior rules often give localized solutions E.g., IF ((x1 ≈ a AND (x2 ≈ b)) OR (x3 ≈ c) THEN y=0.1 p1 = exp

−(x1 − a)2

2s2

1

exp
−(x2 − b)2

2s2

2

with w1 = 0.1

p2 = exp

−(x3 − c)2

2s2

3

with w2 = 0.1

19

SLIDE 20

Localizing Learning after Localizing

Learning after Localizing

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

20

SLIDE 21

Localizing Learning after Localizing

Hybrid Learning

Use unsupervised techniques to learn centers (and spreads) Learn 2nd layer weight by supervised gradient-descent

21

SLIDE 22

Localizing Learning after Localizing

Fully Supervised

Training both levels at once E ({ mh, sh, wih}i,h|X) = 1 2

t
i

(rt

i − yt i )2

yt

i = H

h=1

wihpt

h + wi0

∆wih = η

t

(rt

i − yt i )pt h

∆mhj = η

t
i

(rt

i − yt i )wih

pt

h

(xt

j − mhj)

s2

h

∆sh = η

t
i

(rt

i − yt i )wih

pt

h

|| xt − mh||2 s3

h

22

SLIDE 23

Localizing Learning after Localizing

Mixture of Experts

In RBF, each local fit is a constant, wih In MoE, each local fir is a linear function of x, a “local expert”: wt

ih =

vT

ih

xt The gh form a gating network

23

SLIDE 24

Localizing Learning after Localizing

Gating

The gating network selects a mixture of models from the local experts (wh) Radial gating gt

h =

exp

− ||

xt− mh||2 2s2

h

j exp
− ||

xt− mj||2 2s2

j

Softmax gating

gt

h =

exp [ mT

h

xt]

j exp [

mT

j

xt]

24

SLIDE 25

Localizing Learning after Localizing

Cooperative MoE

E ({ mh, sh, wih}i,h|X) = 1 2

t
i

(rt

i − yt i )2

∆ vih = η

t

(rt

i − yt i )gt h

xt ∆ mhj = η

t

(rt

i − yt i )(wt ih − yt i )gt hxt j

25

SLIDE 26

Localizing Learning after Localizing

Cooperative & Competitive MoE

Cooperative ∆ vih = η

t

(rt

i − yt i )gt h

xt ∆ mhj = η

t

(rt

i −yt i )(wt ih −yt i )gt hxt j

Competitive ∆ vih = η

t

(rt

i − yt i )f t h

xt ∆ mh = η

t

(f t

i − gt i )

xt fh is the posterior prob. of unit h taking both the input and output into account.

26