Release from Active Learning / Release from Active Learning / Model - - PowerPoint PPT Presentation
Release from Active Learning / Release from Active Learning / Model - - PowerPoint PPT Presentation
IJCNN2002 May 12-17, 2002 Release from Active Learning / Release from Active Learning / Model Selection Dilemma: Model Selection Dilemma: Optimizing Sample Points and Optimizing Sample Points and Models at the Same Time Models at the Same
2
From , find so that it is as close to as possible
Supervised Learning: Function Approximation Supervised Learning: Function Approximation
target Learning : ) (x f result Learned : ) ( ˆ x f
1
x
2
x
2
y
M
x
M
y
1
y L
{ }
M m m m y
x
1
,
=
) ( ˆ x f
{ }
Samples :
M m m m y
x
1
,
= m m m
x f y ε + = ) ( ) (x f L
3
Determine for optimal generalization
Active Learning Active Learning
{ }
M m m
x
1 = 1
x
2
x
1
x
2
x
Location of sample points AFFECTS heavily
G x
J
m}
{
min
error tion Generaliza :
G
J
Target function Learned result
4
Model Selection Model Selection
Too simple Appropriate Too complex
Target function Learned result
Choice of models AFFECTS heavily
(Model refers to, e.g., order of polynomials)
Select a model for optimal generalization
S
G C S
J
∈
min
error tion Generaliza :
G
J candidates model
- f
Set : C
5
Simultaneous Optimization of Sample Points and Models Simultaneous Optimization of Sample Points and Models
So far, active learning and model selection have been studied thoroughly, but INDEPENDENTLY Simultaneously determine sample points and a model for optimal generalization
G C S x
J
m
∈ }, { min
{ }
M m m
x
1 =
S
error tion Generaliza :
G
J candidates model
- f
Set : C
6
Active Learning / Model Selection Dilemma Active Learning / Model Selection Dilemma
We can NOT directly optimize sample points and models simultaneously by simply combining existing active learning and model selection methods Model should be fixed for active learning Sample points should be fixed for model selection Because…
7
How to Dissolve the Dilemma How to Dissolve the Dilemma
- 1. Find sample points that are
commonly optimal for all models
- 2. Just perform model selection as usual
} , , { :
3 2 1
S S S C = candidates Model
1 } {
min arg S for JG
xm 2 } {
min arg S for JG
xm 3 } {
min arg S for JG
xm G x m
J x
m
(OPT) } {
min arg } { = C S all for ∈
{ }
M m OPT m
x
1 ) ( =
} {
m
x points sample
- f
set A
8
From here on, we assume
- Least mean squares (LMS) estimate
- Generalization measure:
Is It Just Idealistic? Is It Just Idealistic?
No! Commonly optimal sample points surely exist for trigonometric polynomial models
∫
−
− =
π π
dx x f x f E JG ) ( ) ( ˆ
noise
- ver
n Expectatio : E
( )
∑
= +
+ + =
n p p p
px px x f
1 1 2 2 1
cos sin ) ( ˆ θ θ θ n
- rder
- f
model polynomial ric Trigonomet
9
Theorem Theorem
For all trigonometric polynomial models that include learning target function, equidistance sampling gives the optimal generalization capability
π − π M π 2
1
x
2
x
3
x
M
x L
samples
- f
Number : M 1-dimensional input
10
Multi-Dimensional Input Cases Multi-Dimensional Input Cases
1
x
2
x
15
x L L
2-dimensional input
Sampling on regular grid is optimal
11
Computer Simulations (Artificial, Realizable) Computer Simulations (Artificial, Realizable)
Learning target function: Model candidates: Generalization measure: Sampling schemes: Equidistance sampling Random sampling
} , , , , {
100 2 1
S S S S C K = n Sn
- rder
- f
model polynomial ric Trigonomet :
∫
−
− =
π π π
dx x f x f JG ) ( ) ( ˆ
2 1 50
S f ∈
12
Simulation Results (Large Samples) Simulation Results (Large Samples)
Averaged over 100 trials 500 = samples
- f
Number Horizontal: Order of models Vertical: Generalization error
Equidistance sampling outperforms random sampling for all models!
02 . = variance Noise 08 . = variance Noise
13
Simulation Results (Small Samples) Simulation Results (Small Samples)
With small samples, equidistance sampling performs excellently for all models!
230 = samples
- f
Number Horizontal: Order of models Vertical: Generalization error Averaged over 100 trials 02 . = variance Noise 08 . = variance Noise
14
Computer Simulations (Unrealizable) Computer Simulations (Unrealizable)
Interpolate 600 chaotic series (red) from noisy samples (blue) Model candidates:
} , , , , {
40 2 1
S S S S C K = n Sn
- rder
- f
model polynomial ric Trigonomet :
15
Equidistance sampling outperforms random sampling for all models!
Simulation Results (Unrealizable) Simulation Results (Unrealizable)
Horizontal: Order of models Vertical: Test error at all 600 points Averaged over 100 trials
) 04 . , 300 ( ) , (
2
= σ M ) 07 . , 100 ( ) , (
2
= σ M
16
Interpolated Chaotic Series Interpolated Chaotic Series
After model selection with equidistance sampling,
13
S : model Selected
17
Compared with True Series Compared with True Series
We obtained good estimates from sparse data!
18
Conclusions Conclusions
Active learning / model selection dilemma:
Sample points and models can not be simultaneously
- ptimized by simply combining existing active learning