Release from Active Learning / Release from Active Learning / Model - - PowerPoint PPT Presentation

release from active learning release from active learning
SMART_READER_LITE
LIVE PREVIEW

Release from Active Learning / Release from Active Learning / Model - - PowerPoint PPT Presentation

IJCNN2002 May 12-17, 2002 Release from Active Learning / Release from Active Learning / Model Selection Dilemma: Model Selection Dilemma: Optimizing Sample Points and Optimizing Sample Points and Models at the Same Time Models at the Same


slide-1
SLIDE 1

May 12-17, 2002 IJCNN2002

Release from Active Learning / Model Selection Dilemma: Optimizing Sample Points and Models at the Same Time Release from Active Learning / Model Selection Dilemma: Optimizing Sample Points and Models at the Same Time

Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan Masashi Sugiyama Hidemitsu Ogawa

slide-2
SLIDE 2

2

From , find so that it is as close to as possible

Supervised Learning: Function Approximation Supervised Learning: Function Approximation

target Learning : ) (x f result Learned : ) ( ˆ x f

1

x

2

x

2

y

M

x

M

y

1

y L

{ }

M m m m y

x

1

,

=

) ( ˆ x f

{ }

Samples :

M m m m y

x

1

,

= m m m

x f y ε + = ) ( ) (x f L

slide-3
SLIDE 3

3

Determine for optimal generalization

Active Learning Active Learning

{ }

M m m

x

1 = 1

x

2

x

1

x

2

x

Location of sample points AFFECTS heavily

G x

J

m}

{

min

error tion Generaliza :

G

J

Target function Learned result

slide-4
SLIDE 4

4

Model Selection Model Selection

Too simple Appropriate Too complex

Target function Learned result

Choice of models AFFECTS heavily

(Model refers to, e.g., order of polynomials)

Select a model for optimal generalization

S

G C S

J

min

error tion Generaliza :

G

J candidates model

  • f

Set : C

slide-5
SLIDE 5

5

Simultaneous Optimization of Sample Points and Models Simultaneous Optimization of Sample Points and Models

So far, active learning and model selection have been studied thoroughly, but INDEPENDENTLY Simultaneously determine sample points and a model for optimal generalization

G C S x

J

m

∈ }, { min

{ }

M m m

x

1 =

S

error tion Generaliza :

G

J candidates model

  • f

Set : C

slide-6
SLIDE 6

6

Active Learning / Model Selection Dilemma Active Learning / Model Selection Dilemma

We can NOT directly optimize sample points and models simultaneously by simply combining existing active learning and model selection methods Model should be fixed for active learning Sample points should be fixed for model selection Because…

slide-7
SLIDE 7

7

How to Dissolve the Dilemma How to Dissolve the Dilemma

  • 1. Find sample points that are

commonly optimal for all models

  • 2. Just perform model selection as usual

} , , { :

3 2 1

S S S C = candidates Model

1 } {

min arg S for JG

xm 2 } {

min arg S for JG

xm 3 } {

min arg S for JG

xm G x m

J x

m

(OPT) } {

min arg } { = C S all for ∈

{ }

M m OPT m

x

1 ) ( =

} {

m

x points sample

  • f

set A

slide-8
SLIDE 8

8

From here on, we assume

  • Least mean squares (LMS) estimate
  • Generalization measure:

Is It Just Idealistic? Is It Just Idealistic?

No! Commonly optimal sample points surely exist for trigonometric polynomial models

− =

π π

dx x f x f E JG ) ( ) ( ˆ

noise

  • ver

n Expectatio : E

( )

= +

+ + =

n p p p

px px x f

1 1 2 2 1

cos sin ) ( ˆ θ θ θ n

  • rder
  • f

model polynomial ric Trigonomet

slide-9
SLIDE 9

9

Theorem Theorem

For all trigonometric polynomial models that include learning target function, equidistance sampling gives the optimal generalization capability

π − π M π 2

1

x

2

x

3

x

M

x L

samples

  • f

Number : M 1-dimensional input

slide-10
SLIDE 10

10

Multi-Dimensional Input Cases Multi-Dimensional Input Cases

1

x

2

x

15

x L L

2-dimensional input

Sampling on regular grid is optimal

slide-11
SLIDE 11

11

Computer Simulations (Artificial, Realizable) Computer Simulations (Artificial, Realizable)

Learning target function: Model candidates: Generalization measure: Sampling schemes: Equidistance sampling Random sampling

} , , , , {

100 2 1

S S S S C K = n Sn

  • rder
  • f

model polynomial ric Trigonomet :

− =

π π π

dx x f x f JG ) ( ) ( ˆ

2 1 50

S f ∈

slide-12
SLIDE 12

12

Simulation Results (Large Samples) Simulation Results (Large Samples)

Averaged over 100 trials 500 = samples

  • f

Number Horizontal: Order of models Vertical: Generalization error

Equidistance sampling outperforms random sampling for all models!

02 . = variance Noise 08 . = variance Noise

slide-13
SLIDE 13

13

Simulation Results (Small Samples) Simulation Results (Small Samples)

With small samples, equidistance sampling performs excellently for all models!

230 = samples

  • f

Number Horizontal: Order of models Vertical: Generalization error Averaged over 100 trials 02 . = variance Noise 08 . = variance Noise

slide-14
SLIDE 14

14

Computer Simulations (Unrealizable) Computer Simulations (Unrealizable)

Interpolate 600 chaotic series (red) from noisy samples (blue) Model candidates:

} , , , , {

40 2 1

S S S S C K = n Sn

  • rder
  • f

model polynomial ric Trigonomet :

slide-15
SLIDE 15

15

Equidistance sampling outperforms random sampling for all models!

Simulation Results (Unrealizable) Simulation Results (Unrealizable)

Horizontal: Order of models Vertical: Test error at all 600 points Averaged over 100 trials

) 04 . , 300 ( ) , (

2

= σ M ) 07 . , 100 ( ) , (

2

= σ M

slide-16
SLIDE 16

16

Interpolated Chaotic Series Interpolated Chaotic Series

After model selection with equidistance sampling,

13

S : model Selected

slide-17
SLIDE 17

17

Compared with True Series Compared with True Series

We obtained good estimates from sparse data!

slide-18
SLIDE 18

18

Conclusions Conclusions

Active learning / model selection dilemma:

Sample points and models can not be simultaneously

  • ptimized by simply combining existing active learning

and model selection methods

How to dissolve the dilemma:

Find commonly optimal sample points for all models

Is it realistic?

Commonly optimal sample points surely exist for trigonometric polynomial models: equidistance sampling

Is it practical?

Computer simulations showed that the proposed method works excellently even in unrealizable cases