Local Models Steven J Zeil Old Dominion Univ. Fall 2010 1 - - PowerPoint PPT Presentation

local models
SMART_READER_LITE
LIVE PREVIEW

Local Models Steven J Zeil Old Dominion Univ. Fall 2010 1 - - PowerPoint PPT Presentation

Localizing Learning after Localizing Local Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Localizing Learning after Localizing Local Models Localizing 1 Competitive Learning Online k-Means Adaptive Resonance Theory


slide-1
SLIDE 1

Localizing Learning after Localizing

Local Models

Steven J Zeil

Old Dominion Univ.

Fall 2010

1

slide-2
SLIDE 2

Localizing Learning after Localizing

Local Models

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

2

slide-3
SLIDE 3

Localizing Learning after Localizing

Local Models

Piecewise approaches to regression. Divide input space into local regions and learn simple models

  • n each region.

Localization can be supervised or unsupervised Learning is then supervised Or can do both at

  • nce

3

slide-4
SLIDE 4

Localizing Learning after Localizing

Localizing

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

4

slide-5
SLIDE 5

Localizing Learning after Localizing

Competitive Learning

Competitive methods will assign x to one region and apply a function associated with that single region. Cooperative methods will apply a mixture of functions weighted according to which region x is most likely to belong.

5

slide-6
SLIDE 6

Localizing Learning after Localizing

Competitive Learning Techniques

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

6

slide-7
SLIDE 7

Localizing Learning after Localizing

Online k-Means

E

  • {

mi}k

i=1|X

  • =
  • t
  • i

bt

i ||

xt − mi|| bt

i =

1 if || xt − mi|| = minj || xt − mj||

  • w

batch k-means: mi =

  • t bt

i

xt

  • t bt

i

  • nline k-means:

∆mij = −η ∂E t ∂mij = ηbt

i (xt i − mij)

7

slide-8
SLIDE 8

Localizing Learning after Localizing

Winner-take-all Network

Online k-means can be implemented via a variant of perceptrons Blue lines are inhibitory connections - seek to suppress other values Red are excitory - attempt to reinforce own output With appropriate weights, these suppress all but te maximum

8

slide-9
SLIDE 9

Localizing Learning after Localizing

Adaptive Resonance Theory (ART)

Incrementally adds new cluster means ρ denotes vigilance If a new x lies outside the vigilance of all cluster centers, use that x as the center of a new cluster

9

slide-10
SLIDE 10

Localizing Learning after Localizing

Self-Organizing Maps (SOM)

Units (cluster means) have a neighborhood

More often, 2D

Update both the closest mean mi to x but also the

  • nes in

mi’s neighborhood

Strength of the update falls off with steps through the neighborhood

∆ mj = ηe(j, i)( xt − mi) e(j, i) = 1 √ 2πσ exp

  • −(j − i)2

2σ2

  • 10
slide-11
SLIDE 11

Localizing Learning after Localizing

Learning Vector Quantization (LVQ)

Supervised technique Assume that existing cluster means are labeled with classes If x is closest to mi, ∆ mi = η( xt − mi) if label( xt) = label( mi) ∆ mi = −η( xt − mi)

  • w

11

slide-12
SLIDE 12

Localizing Learning after Localizing

Radial-Basis Functions

A weighted distance from a cluster mean pt

h = exp

  • −||

xt − mh||2 2s2

h

  • sh is the “spread” around

mh

12

slide-13
SLIDE 13

Localizing Learning after Localizing

Radial Functions and Perceptrons

pt

h = exp

  • −||

xt − mh||2 2s2

h

  • yt =

H

  • h=1

whpt

h + w0

Node that the ph are taking the usual place of the xi

13

slide-14
SLIDE 14

Localizing Learning after Localizing

Using RBFs as a New Basis

14

slide-15
SLIDE 15

Localizing Learning after Localizing

Obtaining RBFs

Unsupervised:

Use any prior technique to compute the means (e.g., k-means) Set the spread to cover the cluster

Find the xt belonging to cluster h but farthest from mh Set sh so that pt

h ≈ 0.5

Supervised: Because ph are differentiable, can combine with training of overall function

15

slide-16
SLIDE 16

Localizing Learning after Localizing

Falling Between the Cracks

With RBFs it is possible for some x to fall outside region of influence of all clusters. May be useful to train an “overall” model and then train local exceptions yt =

H

  • h=1

whpt

h

  • exceptions

+ vT xt + v0

  • default rule

16

slide-17
SLIDE 17

Localizing Learning after Localizing

Rule with Exceptions

17

slide-18
SLIDE 18

Localizing Learning after Localizing

Normalized Basis Functions

Alternatively, normalize the basis functions so that their sum is 1.0 Do cooperative calculation

18

slide-19
SLIDE 19

Localizing Learning after Localizing

Rule-Based Knowledge

Prior rules often give localized solutions E.g., IF ((x1 ≈ a AND (x2 ≈ b)) OR (x3 ≈ c) THEN y=0.1 p1 = exp

  • −(x1 − a)2

2s2

1

  • exp
  • −(x2 − b)2

2s2

2

  • with w1 = 0.1

p2 = exp

  • −(x3 − c)2

2s2

3

  • with w2 = 0.1

19

slide-20
SLIDE 20

Localizing Learning after Localizing

Learning after Localizing

1

Localizing Competitive Learning

Online k-Means Adaptive Resonance Theory Self-Organizing Maps

Learning Vector Quantization Radial-Basis Functions

Falling Between the Cracks Rule-Based Knowledge

2

Learning after Localizing Hybrid Learning Competitive Basis Functions Mixture of Experts (MoE)

20

slide-21
SLIDE 21

Localizing Learning after Localizing

Hybrid Learning

Use unsupervised techniques to learn centers (and spreads) Learn 2nd layer weight by supervised gradient-descent

21

slide-22
SLIDE 22

Localizing Learning after Localizing

Fully Supervised

Training both levels at once E ({ mh, sh, wih}i,h|X) = 1 2

  • t
  • i

(rt

i − yt i )2

yt

i = H

  • h=1

wihpt

h + wi0

∆wih = η

  • t

(rt

i − yt i )pt h

∆mhj = η

  • t
  • i

(rt

i − yt i )wih

  • pt

h

(xt

j − mhj)

s2

h

∆sh = η

  • t
  • i

(rt

i − yt i )wih

  • pt

h

|| xt − mh||2 s3

h

22

slide-23
SLIDE 23

Localizing Learning after Localizing

Mixture of Experts

In RBF, each local fit is a constant, wih In MoE, each local fir is a linear function of x, a “local expert”: wt

ih =

vT

ih

xt The gh form a gating network

23

slide-24
SLIDE 24

Localizing Learning after Localizing

Gating

The gating network selects a mixture of models from the local experts (wh) Radial gating gt

h =

exp

  • − ||

xt− mh||2 2s2

h

  • j exp
  • − ||

xt− mj||2 2s2

j

  • Softmax gating

gt

h =

exp [ mT

h

xt]

  • j exp [

mT

j

xt]

24

slide-25
SLIDE 25

Localizing Learning after Localizing

Cooperative MoE

E ({ mh, sh, wih}i,h|X) = 1 2

  • t
  • i

(rt

i − yt i )2

∆ vih = η

  • t

(rt

i − yt i )gt h

xt ∆ mhj = η

  • t

(rt

i − yt i )(wt ih − yt i )gt hxt j

25

slide-26
SLIDE 26

Localizing Learning after Localizing

Cooperative & Competitive MoE

Cooperative ∆ vih = η

  • t

(rt

i − yt i )gt h

xt ∆ mhj = η

  • t

(rt

i −yt i )(wt ih −yt i )gt hxt j

Competitive ∆ vih = η

  • t

(rt

i − yt i )f t h

xt ∆ mh = η

  • t

(f t

i − gt i )

xt fh is the posterior prob. of unit h taking both the input and output into account.

26

slide-27
SLIDE 27

Localizing Learning after Localizing

Cooperative vs. Competitive

Cooperative is generally more accurate.

Models overlap, giving smoother fit

Competitive generally learns faster.

Generally only one expert at a time is active

27