Sequential Model List Selection for Function Approximation Ernest - - PowerPoint PPT Presentation

sequential model list selection for function approximation
SMART_READER_LITE
LIVE PREVIEW

Sequential Model List Selection for Function Approximation Ernest - - PowerPoint PPT Presentation

Sequential Model List Selection for Function Approximation Ernest Fokou e epf@samsi.info Joint work with Bertrand Clarke UBC, SAMSI, Duke Ernest Fokou c e Sequential Model List Selection for Function Approximation 1/31 Outline of


slide-1
SLIDE 1

Sequential Model List Selection for Function Approximation

Ernest Fokou´ e

epf@samsi.info

Joint work with Bertrand Clarke UBC, SAMSI, Duke

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 1/31

slide-2
SLIDE 2

Outline of the Presentation

General Introduction

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-3
SLIDE 3

Outline of the Presentation

General Introduction Sources of Uncertainty

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-4
SLIDE 4

Outline of the Presentation

General Introduction Sources of Uncertainty Appeal of Model Averaging

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-5
SLIDE 5

Outline of the Presentation

General Introduction Sources of Uncertainty Appeal of Model Averaging Pitfalls of Naive Averages

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-6
SLIDE 6

Outline of the Presentation

General Introduction Sources of Uncertainty Appeal of Model Averaging Pitfalls of Naive Averages A Sequential Selection Solution

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-7
SLIDE 7

Outline of the Presentation

General Introduction Sources of Uncertainty Appeal of Model Averaging Pitfalls of Naive Averages A Sequential Selection Solution Illustrative Examples

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-8
SLIDE 8

Outline of the Presentation

General Introduction Sources of Uncertainty Appeal of Model Averaging Pitfalls of Naive Averages A Sequential Selection Solution Illustrative Examples Conclusion and Future Work

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 2/31

slide-9
SLIDE 9

General Problem Formulation

Given iid data D = {(xi, yi)n

i=1} where

Yi = f⋆(xi) + ǫi

Function approximation Find fopt = arg min

f∈F

R(f) R(f) = Exy

  • (Y − f(X))2 is risk functional.

Prediction error R(f)

R(f) = 1 m

m

  • i=1

(ynew

i

− f(xnew

i

))2

How to find this predictively optimal function f?

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 3/31

slide-10
SLIDE 10

Basis Expansion Approach

Basis function set

E = {e1, e2, · · · , ek}

Function space

F ≡ span E ∀f ∈ F, ∃p ∈ {1, 2, · · · , k} f(x) = β0 +

p

  • j=1

βjej(x)

(1)

Model space:

M = {M : where M models a function of the form (1)}

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 4/31

slide-11
SLIDE 11

Sources of Uncertainty

Parameter uncertainty For any given model M ∈ M, there is uncertainty in its parameters. Bayesian inference takes care

  • f this uncertainty very well?

Model uncertainty Given a list M ⊂ M of "plausible" models from M, different models will produce different predictions. Model Averaging and Model Selection help account for model uncertainty? Model list uncertainty For a class of models in a model space M, how do we select a list M of plausible models? Topic always ignored!

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 5/31

slide-12
SLIDE 12

The Appeal of Model Averaging

Bayesian Model Averaging is well established as the optimal predictive solution in function approximation So, if predictive optimality is the will, then Bayesian Model Averaging would seem to be the way

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 6/31

slide-13
SLIDE 13

Pitfalls of Naive Model Averaging

It happens that from the same model space M some model lists produce higher prediction errors than

  • thers ...

Careless prior specification on a single model list can denigrate the model average obtained from it. Arbitrary large model lists have been seen to increase the average prediction error. Note: Model list variability has not been given the proper care that it deserves.

Note: This work argues that a selective model averaging

might be the way to negociate a bias-variance trade-off so as to drive the prediction error as small as possible.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 7/31

slide-14
SLIDE 14

Pitfalls of Naive Averages (I)

Existence of regions of high redundancy in model space Cause: Highly correlated predictors or linearly dependent basis functions. Consequence: Uniformity of p(M) leads to skewness

  • f p(M|D): Averages suspicious.

A remedy: Dilution priors by Ed George

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 8/31

slide-15
SLIDE 15

Dilution Priors

Assign prior probabilities uniformly to model neighborhoods. Bayesian Linear Model Voronoi tessellation of full model space. Bayesian CART Tree-generating process priors (CART). Note: Such priors do not require subjective inputs.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 9/31

slide-16
SLIDE 16

Pitfalls of Naive Averages (II)

Vague convergence to zero Causes: Model list far larger than n Uniform prior p(M) Large list of similar models Consequence

p(M|D) → 0

as

|M|

gets large

A remedy: Sequential model list selection

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 10/31

slide-17
SLIDE 17

Insights and Conjectures

For a given problem and an optimality criterion thereof there must exist an

  • ptimum model list.

Such an optimal model list achieves the best bias-variance trade-off for the given problem. Regularization in model space

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 11/31

slide-18
SLIDE 18

Evidence of optimum model list

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 τ Estimated Average Prediction Error Study of f(x) = 2+ 2*sin(x) + 1.25*sin(2*x) + sin(7*x) using the chebyshev basis set BMA = 3, mu = 50%, TF=1, nu=100%, Runs = 100

The x-axis on the above graph is a model list index. Clearly, we see that there is an optimum model list at 0.7.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 12/31

slide-19
SLIDE 19

Sequential Model List Selection

The building blocks of the method are: Selection threshold τ where τ ∈ [0, 2] Working basis set W(t) ⊆ E Term formation scheme (TF) Averaging scheme (BMA) Proportion of terms to use (ν ∈ [0, 1]) Proportion of models to include (µ ∈ [0, 1]) Distance measure d(·, ·) use to search E Remember that our goal is predictive optimality

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 13/31

slide-20
SLIDE 20

Model Averaging Schemes

What models go into the average?

We use an index named BMA to identify the scheme BMA = 1 Small size models: 1, 2, 3 terms in the models BMA = 2 Medium size models: p/2 terms in the models BMA = 3 Large size models: p, p-1, p-2 terms in the models

Note: For a given scheme, the selection randomly draws 100µ% of the models available in the induced space.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 14/31

slide-21
SLIDE 21

Term Formation Schemes

Motivation: Terms formed as a combination of atoms from the basis set E tend to produce sparse function approximations. TF = 1 Use B(t) = W(t) directly without any partially sums. TF = 2

B(t) = {Partial sums of two elements from W(t)}

TF = 3

B(t) = {Partial sums of three elements from W(t)}

For a given TF , randomly draw 100ν% of the terms. Useful for assessing the efficacy of overcompleteness?

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 15/31

slide-22
SLIDE 22

Function Approximation

At time point t Get D(t) = {(xi, yi), i = 1, · · · , mt} Construct BMA(t) using BMA and TF Estimate the response for D(t): ˆ yi = BMA(t)(xi) Compute the first order residuals: ri = yi − ˆ yi = yi − BMA(t)(xi)

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 16/31

slide-23
SLIDE 23

Update the Model List

  • Search E\W(t)

for j = 1 to |E\W(t)|

r := (r1, r2, · · · , rmt)T ej := (ej(x1), ej(x2), · · · , ej(xmt))T ρj := d(ej, r)

if ρj ≤ τ then W(t) := W(t) ∪ {ej} end

Automation of residual analysis.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 17/31

slide-24
SLIDE 24

What distance to use?

Norm d(ej, r) :=

  • ej

ej − r r

  • Inner Product

d(ej, r) := g ej ej, r r

  • Similarity measures (kernel).

d(ej, r) := K(ej, r)

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 18/31

slide-25
SLIDE 25

Some important issues

Allow only the best candidate Parsimony of model list Not computationally efficient Allow all the good guys Allows "not so good" guys More computationally efficient Consider stochastic search schemes

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 19/31

slide-26
SLIDE 26

Sequential Model List Selection

For a τ ∈ [0, 2] At time point t Receive m i.i.d observations. Get working set W(t) ⊆ E. Form term set B(t) from W(t) Form BMA(t) using B(t) and typology. Update W(t) according to τ

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 20/31

slide-27
SLIDE 27

Basis Sets Considered

Full Fourier basis

E = {sin(jωx), cos(jωx)}

Legendre (j + 1)ej+1(x) = (2j + 1)xej(x) − jej−1(x) Chebyshev

ej(x) = cos(j arccos(x))

Fourier sine: E = {sin(jx)}

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 21/31

slide-28
SLIDE 28

The Deep Valley Function

x ∈ [−π, π] f∗(x) = 2 + sin(x) + 0.5 cos(2x)

−4 −3 −2 −1 1 2 3 4 0.5 1 1.5 2 2.5 3 f(x) = 2+sin(x)+0.5*cos(2*x) x f(x)

Study using various E, τ, TF , BMA, µ and ν. Note: This function lives in the span of Fourier

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 22/31

slide-29
SLIDE 29

Lists with small size models and simple terms when f ∗ ∈ spanE

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.05 0.1 0.15 0.2 0.25 0.3 0.35 τ Estimated Average Prediction Error Study of f(x) = 2+sin(x)+0.5*cos(2*x) using the fourier basis set BMA = 1, mu = 50%, TF=1, nu=100%, Runs = 100

f ∗ ∈ spanE: With small size models, the prediction error stabilizes.

Not need to add more models beyond the optimum once the optimum model list is found.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 23/31

slide-30
SLIDE 30

Lists with large size models, and simple terms when f ∗ ∈ spanE

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 τ Estimated Average Prediction Error Study of f(x) = 2+sin(x)+0.5*cos(2*x) using the fourier basis set BMA = 3, mu = 10%, TF=1, nu=5%, Runs = 100

f ∗ ∈ spanE: With large size models, and simple terms, the prediction

error stabilizes. Not need to add more models beyond the optimum

  • nce the optimum model list is found.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 24/31

slide-31
SLIDE 31

Lists with large size models, and complex terms when f ∗ ∈ spanE

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.05 0.1 0.15 0.2 0.25 0.3 0.35 τ Estimated Average Prediction Error Study of f(x) = 2+sin(x)+0.5*cos(2*x) using the fourier basis set BMA = 3, mu = 10%, TF=3, nu=5%, Runs = 100

f ∗ ∈ spanE: With large models, and complex terms, there is a clear

  • ptimum. Adding becomes bad beyond the optimum.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 25/31

slide-32
SLIDE 32

The Nice Hill Function

x ∈ [−1, 1] f∗(x) = 2 + 2 sin(x) + 1.25 sin(2x) + sin(7x)

−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 −2 −1 1 2 3 4 5 6 f(x)=2+2sin(x)+1.25sin(2x)+sin(7x) with x in [−1 1] x f(x)

Study using various E, τ, TF , BMA, µ and ν.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 26/31

slide-33
SLIDE 33

Lists with small size models, and simple terms when f ∗ /

∈ spanE

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 τ Estimated Average Prediction Error Study of f(x) = 2+ 2*sin(x) + 1.25*sin(2*x) + sin(7*x) using the chebyshev basis set BMA = 1, mu = 50%, TF=1, nu=100%, Runs = 100

f ∗ / ∈ spanE: With small size models, and simple terms, the prediction

error increases dramatically with τ. Model lists should be kept small in such a case.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 27/31

slide-34
SLIDE 34

Lists with large size models, and simple terms when f ∗ /

∈ spanE

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 τ Estimated Average Prediction Error Study of f(x) = 2+ 2*sin(x) + 1.25*sin(2*x) + sin(7*x) using the chebyshev basis set BMA = 3, mu = 50%, TF=1, nu=100%, Runs = 100

f ∗ / ∈ spanE: With large size models, and simple terms, there is a

clear optimum model list M∗(τ).

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 28/31

slide-35
SLIDE 35

Lists with large size models, and complex terms when f ∗ /

∈ spanE

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ Estimated Average Prediction Error Prediction Error versus tau for BMA = 3 and Basis=chebyshev

f ∗ / ∈ spanE: With large size models, and complex terms, there is a

clear optimum model list M∗(τ).

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 29/31

slide-36
SLIDE 36

What to make of all that?

Emerging trends

M = {small size models } then f∗ ∈ spanE there is an

  • ptimum beyond which there is neither improvement

nor deterioration.

M = {large size models } then there seems to be a

clear optimal model list.

M = {small size models } then if f∗ / ∈ spanE large

model lists are not good

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 30/31

slide-37
SLIDE 37

Conclusion and Future

Take this home Model List Variability is indeed one of the main sources of uncertainty. Model List Selection is therefore of vital importance. Future work How does one go about estimating the optimum τ in real settings? Test on real data and multivariate predictors. Explore more theoretical aspects.

c

Ernest Fokou´

e Sequential Model List Selection for Function Approximation 31/31