MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids - - PowerPoint PPT Presentation

mlss 06 canberra
SMART_READER_LITE
LIVE PREVIEW

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids - - PowerPoint PPT Presentation

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination Technique Jochen Garcke Regression / Classification via Function


slide-1
SLIDE 1

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

MLSS 06 - Canberra Sparse Grids

Jochen Garcke

Centre for Mathematics and its Applications Mathematical Sciences Institute Australian National University

14th February 2006

slide-2
SLIDE 2

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Outline

(Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique

slide-3
SLIDE 3

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Partial Differential Equations

◮ Poisson equation (model problem)

electric potential u for a given charge f −△u(= −∇2u) = f

◮ Navier-Stokes-equation describes motion of fluid

substances like liquids and gases ∂u ∂t + u∇u − 1 Re△u + ∇p = f ∇ · u = 0

◮ Schrödinger-equation (quantum chemistry)

eigenvalue problem Hψ = λψ with H = −

N

  • i

2 2m∆i −

K

  • α

2 2Mα ∆α −

N,K

  • i,α

eZα rαi +

N

  • i<j

e2 rij +

K

  • α<β

ZαZβ Rαβ .

slide-4
SLIDE 4

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Galerkin-Variational Principle

◮ minimise J(v) = 1 2a(v, v) − f, v with v ∈ V ◮ simplified think as:

Lu = f → a(u, v) =

  • Lu · v = f, v

◮ for model problem −△u →

  • ∇u∇u

◮ minimum u of J equivalent to find u ∈ V which

satisfies a(u, v) = f, v ∀v ∈ V

◮ Lax-Milgram-Lemma: V Hilbertspace, f bounded and

a is bounded (|a(u, v)| ≤ CuV vv) and V-elliptic (CEu2

V ≤ a(u, u)

∀u) exists unique solution u

◮ weak solution to original partial differential equation

slide-5
SLIDE 5

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Discretisation

◮ discretise: VN ⊂ V, VN finite-dimensional ◮ find uN ∈ VN which satisfies

a(uN, vN) = f(vN) ∀vN ∈ VN

◮ Cea-Lemma: a is V-elliptic, u, uN solutions in V, VN,

respectively, it then holds u − uNV ≤ C inf

vn∈VN

u − vNV

slide-6
SLIDE 6

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Example for VN in One Dimension

◮ one-dimensional basis for level 3 ◮ interpolation of parabola φ3,1 φ3,5 φ3,3 φ3,2 φ3,4 φ3,6 φ3,7

slide-7
SLIDE 7

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

One-dimensional Basis Functions

◮ one-dimensional basis functions φl,j(x) with support

[xl,j − hl, xl,j + hl] ∩ [0, 1] = [(j − 1)hl, (j + 1)hl] ∩ [0, 1] are defined by: φl,j(x) = 1 − |x/hl − j|, x ∈ [(j − 1)hl, (j + 1)hl] ∩ [0, 1]; 0,

  • therwise.
slide-8
SLIDE 8

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Basis Functions in More Dimensions

◮ d-dimensional piecewise d-linear hat functions

φl,j(x) :=

d

  • t=1

φlt,jt(xt).

◮ associated function space Vl of piecewise d-linear

functions Vl := span{φl,j | jt = 0, . . . , 2lt, t = 1, . . . , d}

slide-9
SLIDE 9

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Some Notation

◮ simplification: domain ¯

Ω := [0, 1]d

◮ l = (l1, . . . , ld) ∈

  • d denotes a multi-index

◮ define mesh size hl := (2−l1, . . . , 2−ld) ◮ anisotropic grid Ωl on ¯

◮ different, but equidistant mesh sizes ◮ Ωl consists of the points

xl,j := (xl1,j1, . . . , xld,jd), with xlt,jt := jt · hlt = jt · 2−lt and jt = 0, . . . , 2lt Ω3,1

slide-10
SLIDE 10

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Triangulation Instead of Tensor Product

slide-11
SLIDE 11

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Approximation Properties

◮ Dαu = ∂|α| ∂x

α1 1

···∂x

αd d u

◮ Sobolev spaces Hs with norm

u2

Hs =

  • |α|≤s

(Dαu)2

◮ a V-elliptic, VN piecewise (bi)linear, and u ∈ H2

u − uNH1 ≤ Ch|u|H2

◮ error in L2

u − uNL2 ≤ Ch2|u|H2

◮ above results are in two dimensions, similar results in

higher dimensions

slide-12
SLIDE 12

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Interpolation with Hierarchical Basis

φ3,1 φ3,5 φ3,3 φ3,2 φ3,4 φ3,6 φ3,7

nodal basis V1 ⊂ V2 ⊂ V3

φ3,7 φ3,1 φ3,3 φ2,1 φ1,1 φ3,5 φ2,3

hierarchical basis V3 = W3 W2 V1

slide-13
SLIDE 13

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Difference Spaces

◮ l ∈

  • d denotes the level, i.e. the discretisation

resolution, of a grid Ωl, a space Vl or a function fl

◮ j ∈

  • d gives the position of a grid point xl,j or the

corresponding basis function φl,j(·)

◮ hierarchical difference space Wl via

Wl := Vl \

d

  • t=1

Vl−et, (1) where et is the t-th unit vector

◮ In other words, Wl consists of all φk,j ∈ Vl which are

not included in any of the spaces Vk smaller than Vl

◮ to complete the definition, we formally set Vl := 0, if

lt = −1 for at least one t ∈ {1, . . . , d}

slide-14
SLIDE 14

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Tensor Product Decomposition

◮ the index set

Bl :=

  • j ∈
  • d
  • jt = 1, . . . , 2lt − 1,

jt odd, t = 1, . . . , d, if lt > 0, jt = 0, 2lt, t = 1, . . . , d, if lt = 0

◮ leads to

Wl = span{φl,j, j ∈ Bl}

◮ hierarchical difference spaces now allow us the

definition of a multilevel subspace decomposition

◮ we can write Vn := Vn as a direct sum of subspaces

Vn :=

n

  • l1=0

· · ·

n

  • ld=0

Wl =

  • |l|∞≤n

Wl.

◮ |l|∞ := max1≤t≤d lt and |l|1 := d t=1 lt are the

discrete L∞- and the discrete L1-norm of l, respectively

slide-15
SLIDE 15

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Subspaces Wl for V3,3

W0,3 W1,3 W2,3 W3,3 W0,2 W1,2 W2,2 W3,2 W0,1 W1,1 W2,1 W3,1 W0,0 W1,0 W2,0 W3,0

slide-16
SLIDE 16

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Basis [Faber:09,Yserentant:86]

{φl,j, j ∈ Bl}n

l=0 ◮ generalises the one-dimensional hierarchical basis

the d-dimensional case with a tensor product ansatz

◮ supports of φl,j(x) which span Wl are disjunct ◮ function f ∈ Vn can be represented as

f(x) =

  • |l|∞≤n
  • j∈Bl

αl,j · φl,j(x),

◮ number of basis functions, which describe a f ∈ Vn in

nodal or hierarchical basis is (2n + 1)d = O(h−d

n ) ◮ curse of dimensionality:

n=6, d=6 results in 75 418 890 625 points

◮ Vl the following decomposition holds

Vl :=

l1

  • k1=0

· · ·

ld

  • kd=0

Wk =

  • k≤l

Wk.

slide-17
SLIDE 17

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Interpolation with Hierarchical Basis

◮ consider d-linear interpolation of f ∈ V by a fn ∈ Vn ◮ linear interpolation in one dimension, for the

hierarchical coefficients αl,j holds αl,j = f(xl,j)−f(xl,j − h) + f(xl,j + h) 2 = f(xl,j)−f(xl,j−1) + f(xl,j+1) 2 .

◮ illustrates why αl,j are also called hierarchical surplus ◮ they specify what has to be added to hierarchical

representation of level l −1 to obtain the one of level l

◮ rewrite this in the following operator form

αl,j =

  • −1

2 1 − 1 2

  • l,j

f

◮ generalise to the d-dimensional case as follows

αl,j = d

  • i=1
  • −1

2 1 − 1 2

  • li,ji
  • f
slide-18
SLIDE 18

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Sobolev-Space H2

mix with Domin. Mixed Deriv.

◮ define the norm as

f2

Hs

mix =

  • 0≤k≤s
  • ∂|k|1

∂xk f

  • 2

2

,

◮ space H2 mix in the usual way:

Hs

mix := {f : Ω →

: f2

Hs

mix < ∞}

◮ furthermore we define the semi-norm |f|H2

mix

|f|Hk

mix :=

  • ∂|k|1

∂xk f

  • 2

2

,

◮ continuous function spaces Hs mix, like the discrete

spaces Vl, have tensor product structure [Wahba:1990,Griebel.Knapek:2000] Hs

mix = Hs ⊗ · · · ⊗ Hs.

slide-19
SLIDE 19

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Values αl,j are Bounded I

we assume f ∈ H2

0,mix(¯

Ω), i.e. zero boundary values, and l > 0 to avoid the special treatment of level 0, i.e. the boundary functions in the hierarchical representation. Straightforward calculation shows

Lemma

For any piecewise d-linear basis function φl,j holds φl,j2 ≤ C(d) · 2−|l|1/2

Lemma

For any hierarchical coefficient of f ∈ H 2

0,mix(¯

Ω) it holds αl,j =

d

  • i=1

−hi 2

φl,j · D2f(x)dx

slide-20
SLIDE 20

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Values αl,j are Bounded II

Proof.

in one dimension partial integration provides

φl,j · ∂2f(x)dx ∂x2 = xl,j+h

xl,j−h

φl,j · ∂2f(x)dx ∂x2 =

  • φl,j · ∂f(x)dx

∂x xl,j+h

xl,j−h

− xl,j+h

xl,j−h

∂φl,j ∂x · ∂f(x)dx ∂x = − xl,j

xl,j−h

1 h · ∂f(x)dx ∂x + xl,j+h

xl,j

1 h · ∂f(x)dx ∂x = 1 h ·

  • f(xl,j − h) − 2f(xl,j) + f(xl,j + h)
  • =

−2 h · αl,j d-dimensional result is achieved via the tensor product formulation

slide-21
SLIDE 21

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Values αl,j are Bounded III

Lemma

f ∈ H2

0,mix(¯

Ω) as above in hierarchical representation |αl,j| ≤ C(d) · 2−(3/2)·|l|1 ·

  • f|supp(φl,j)
  • H2

mix

Proof.

|αl,j| =

  • d
  • i=1

−hi 2

φl,j · D2f(x)dx

d

  • i=1

hi 2 · φl,j2 ·

  • D2f|supp(φl,j)
  • 2

≤ C(d) · 2−(3/2)·|l|1 ·

  • f|supp(φl,j)
  • H2

mix

slide-22
SLIDE 22

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

  • Hier. Compon. Bounded by Size of Support

Lemma

f ∈ H2

0,mix(¯

Ω) as above in hierarchical representation. For its components fl ∈ Wl holds fl2 ≤ C(d) · 2−2·|l|1 · |f|H2

mix

Proof.

Since the supports of all φl,j are mutually disjoint fl2

2 =

  • j∈Bl

αl,j · φl,j(x)

  • 2

2

=

  • j∈Bl

|αl,j|2 · φl,j2

2

With Lemma 3 and 1 it now follows fl2

2

  • j∈Bl

C(d) · 2−3·|l|1 ·

  • f|supp(φl,j)
  • 2

H2

mix

· C(d) · 2−|l|1

4 l 2

slide-23
SLIDE 23

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Subspaces Wl

W0,3 W1,3 W2,3 W3,3 W0,2 W1,2 W2,2 W3,2 W0,1 W1,1 W2,1 W3,1 W0,0 W1,0 W2,0 W3,0

slide-24
SLIDE 24

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchical Subspaces Wl

W0,3 W1,3 W2,3 W3,3 W0,2 W1,2 W2,2 W3,2 W0,1 W1,1 W2,1 W3,1 W0,0 W1,0 W2,0 W3,0

slide-25
SLIDE 25

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Sparse Grids

◮ we define the sparse grid function space V s n ⊂ Vn as

V s

n :=

  • |l|1≤n

Wl.

◮ every f ∈ V s n can now be represented as

f s

n (x) =

  • |l|1≤n
  • j∈Bl

αl,jφl,j(x).

◮ approximation property in H2 mix

||f − f s

n ||2 = O(h2 n log(h−1 n )d−1) ◮ sparse grid needs O(h−1 n (log(h−1 n ))d−1) points ◮ here n=6, d=6 results in 483 201 points

slide-26
SLIDE 26

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Sparse Grids in two and three dimensions

slide-27
SLIDE 27

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

History of Sparse Grids

◮ 1960 Babenko "hyperbolic crosses" ◮ 1963 Smolyak ◮ 1971 Gordon

1992 Baszenski, Delvos, Jester "discrete blending methods"

◮ 1981 Delvos "boolean interpolation" ◮ 1990 Zenger, Griebel "sparse grids"

Overview article: Bungartz and Griebel in Acta Numerica 2004 Tutorial on Sparse Grids on my webpage wwwmaths.anu.edu.au/~garcke/

slide-28
SLIDE 28

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Some Recent Applications of Sparse Grids

◮ numerical integration (Smolyak ’63, Bonk ’94, Novak,

Ritter ’96, Gerstner, Griebel ’98/’03)

◮ eigenvalues of Schrödinger equation in 6D (G.,

Griebel ’00)

◮ fluid dynamics (Griebel, Thurner’ 93, Koster’ 02) ◮ numerical integration in finance (option pricing)

(Gerstner, Griebel, Wahl ’01)

◮ stochastic differential equations (Schwab, Todor ’03) ◮ Maxwell equation (Gradinaru, Hiptmair ’03) ◮ parabolic equations for option pricing (Reisinger ’04) ◮ wavelet-based sparse grid for parabolic problems

(Petersdorff.Schwab ’04)

◮ classification, regression, data analysis (G., Griebel,

Thess ’01, Hegland ’03, Bendel.Kahrs.Marquardt ’05)

◮ Fokker-Planck-equation (Feuersänger.Griebel ’05) ◮ Gene Regulatory Networks (Hegland et. al. ’06)

slide-29
SLIDE 29

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Simple Example in Numerical Integration 10D

Consider smooth integrand

10 e−xT x+xT bdx

5 digits

◮ PR: 317 y ◮ MC: 100 s ◮ QM: 1 s ◮ SG: 0.02 s

(Gerstner, Griebel ’98)

slide-30
SLIDE 30

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

How to Compute on a Sparse Grid

◮ computation on sparse grid in hierarchical basis ◮ function f s n ∈ V s n is split

f s

n (x) =

  • |l|1≤n

ˆ fl with ˆ fl ∈ Wl

◮ example V s 2 = W20 ⊕ W11 ⊕ W02 ⊕ W10 ⊕ W01 ⊕ W00 ◮ recall difference-spaces Wl

Wl := Vl \

d

  • t=1

Vl−et

◮ observe

W11 = V11 \ (V10 ∪ V01) = V11 − V10 − V01 + V00

◮ telescope sum property gives f s n as sum of fl ∈ Vl ◮ combination technique is based on this property

slide-31
SLIDE 31

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Combination Technique of Level 4 in 2d

Ω4,0

Ω3,1

Ω2,2

Ω1,3

Ω0,4

Ω3,0

Ω2,1

Ω1,2

Ω0,3

=

Ωs

4

f c

n =

X

l1+l2=n+1

fl1,l2 − X

l1+l2=n

fl1,l2

slide-32
SLIDE 32

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Telescope Sum Property for Interpolation

We write it exemplary in the two dimensional case, using ˆ fl1+l2 ∈ Wl1,l2 instead of all the basis functions of Wl1,l2 for ease of presentation f c

n

=

  • l1+l2=n

fl1,l2 −

  • l1+l2=n−1

fl1,l2 =

  • l1≤n
  • k1≤l1
  • k2≤n−l1

ˆ fk1,k2 −

  • l1≤n−1
  • k1≤l1
  • k2≤n−1−l1

ˆ fk1,k2 =

  • k1≤l1

ˆ fk1,1 +

  • l1≤n−1
  • k1≤l1

 

k2≤n−l1

ˆ fk1,k2 −

  • k2≤n−1−l1

ˆ fk1,k2   =

  • k1≤l1

ˆ fk1,1 +

  • l1≤n−1
  • k1≤l1

ˆ fk1,n−l1 =

  • k1+t≤n

ˆ fk1,t with t := n + 1 − l1.

slide-33
SLIDE 33

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Sparse Grid Combination Technique

◮ solve problem on sequence of full grids Ωl

l1 + ... + ld = n − q, q = 0, .., d − 1, lt ≥ 0

◮ combine results fl(x) ∈ Vl for solution on sparse grid

Ω(s)

n

f (c)

n (x) := d−1

  • q=0

(−1)q d − 1 q

  • l1+...+ld=n−q

fl(x)

◮ number of spaces for V c n = O(dnd−1) ◮ dim(Vl) = O(2d−1 · h−1 n ) ◮ practical limit d ≈ 20 due to memory constraints ◮ interpolation: combinat. technique is sparse grid

solution

◮ for PDE problems of same approximation order (as

long as certain error expansion holds)

slide-34
SLIDE 34

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Generalised Combination Technique

◮ generalised combination formula

f (c)

I

(x) :=

  • k∈I

 

(1,...,1)

  • z=(0,...,0)

(−1)|z|1 · χI(k + z)   fk(x).

slide-35
SLIDE 35

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Summary Sparse Grids

◮ sparse grids allow the representation of functions in

high dimensions (up to about 20)

◮ controlled approximation order to the right solution

(under certain assumption on the smoothness)

◮ efficient (multigrid) solvers often exist O(N) where N

is number of points

◮ used for numerical solution of partial differential

equations and numerical integration

slide-36
SLIDE 36

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Problem Setting for Regression / Classification

◮ given set of data points x with label y

S = {(xi, yi)}M

i=1

xi ∈

d, yi ∈ {−1, 1} or yi ∈

◮ we want to reconstruct underlying function f, the

classifier

◮ ill-posed problem → use regularisation for

well-posedness

◮ compromise between approximation and

generalisation

◮ approximate given training data set S (data error) ◮ ’good’ results on new, unseen data

◮ d can be large ◮ S can consist of millions to billions of data points ◮ wide applications in business and research

slide-37
SLIDE 37

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Regularisation Theory

◮ consider variational problem minf∈V R(f) with

R(f) = 1 M

M

  • i=1

(f(xi) − yi)2 + λSf2

◮ error of classifier on given data 1

M

M

i=1 (f(xi) − yi)2

◮ regularisation (or stabilisation) operator S ◮ regularisation parameter λ

◮ representer theorem in RKHS

[Kimeldorf.Wahba:1971] f(x) = M

i=1 αik(x, xi) ◮ linear superposition of kernel functions ◮ corresponding Galerkin equations are

1 m

m

  • i=1

f(xi)g(xi) + λSf, Sg2 = 1 m

m

  • i=1

g(xi)yi, which hold for the minimum f ∈ V of R(f) and all g ∈ V

slide-38
SLIDE 38

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Discretisation

◮ restrict to finite dimensional subspace VN ∈ V ◮ regularisation by discretisation/projection [Natterer

’77]

◮ minimise in VN, with fN = N j=1 αjϕj(x), use S := ∇,

Rreg(fN) = 1 M

M

  • i=1

(fN(xi) − yi)2 + λ∇fN2

L2 ◮ with ∂αRreg = 0 follows (k = 1, . . . , N) N

  • j=1

αj

  • Mλ(∇ϕj, ∇ϕk)L2 +

M

  • i=1

ϕj(xi) · ϕk(xi)

  • =

M

  • i=1

yiϕk(xi)

◮ results in linear equation system

(λ · MC + Bt · B)α = Bty

slide-39
SLIDE 39

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Complexities

◮ solve on each grid:

(λ · MCl + Bt

l · Bl)αl = Bt l y

(Bt · B)j,k = M

i ϕj(xi) · ϕk(xi)

d-linear basis functions linear basis on simplices Cl Bl · Bt

l

Cl Bl · Bt

l

assemb. O(3d · N) O(22d · M) O(2 · d · N) O(d 2 · M) storage O(3d · N) O(3d · N) O(2 · d · N) O(2d · N)

(1,1,1) (0,0,0)

◮ scales linearly with M, the number of data points

(N ≪ M)

◮ properties of combination technique using simplices

◮ non nestedness of discrete spaces ◮ error expansion not in regard to anisotropic mesh

size h

slide-40
SLIDE 40

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Coarse and Fine Grain Parallelisation

◮ partial functions on each grid Ωl in the collection of

grids can be computed independently of each other

◮ computation of BT · B depends on xi. Parallelise loop

  • ver xi.

(BT · B)j,k =

M

  • i

ϕj(x(i)) · ϕk(x(i))

◮ parallelisation of the matrix-vector-multiplication

through splitting of vector in p-parts of size n/p

◮ combination of coarse and fine grain parallelisation

possible

slide-41
SLIDE 41

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Ripley Data Set

slide-42
SLIDE 42

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

ndcHard (synthetic) 10 D with 2 Million Data

training testing total data matrix # of points % % time (sec) time (sec) 20 000 86.2 84.2 6.3 0.9 level 1 200 000 85.1 84.8 16.2 8.7 2 million 84.9 84.9 114.9 84.9 20 000 85.1 83.8 134.6 10.3 level 2 200 000 84.5 84.2 252.3 98.2 2 million 84.3 84.2 1332.2 966.6

◮ linear SVM 69.8 % / 69.5 % [Fung.Mangasarian:01] ◮ no results with non-linear SVM (complexity

constraints)

slide-43
SLIDE 43

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Comparison with Other Methods

data set dim sparse grid %

  • ther %

rank total breast cancer 9 97.66 1.4 97.72 2 25 pima diabetis 8 76.80 1.9 77.63 5 28 heart 13 80.85 2.4 86.90 18 25 thyroid 5 96.07 2.7 95.60 1 11 titanic 3 77.94 0.7 78.96 8 19 flare-solar 9 66.85 1.8 67.57 2 7

◮ compare with results in (mainly) Meyer, Leisch,

Hornik ’03 and some others

◮ methods include support vector machines,

boosting-methods like AdaBoost, linear discriminant analysis, neural networks, nearest neighbor, random forests and multivariate adaptive regression splines

slide-44
SLIDE 44

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Combination Technique Can Diverge

1e-07 1e-06 1e-05 1e-04 0.001 0.01 2 4 6 8 10 ls-error and residual level ct ls-error ct functional 1e-05 1e-04 0.001 0.01 2 4 6 8 10 ls-error and residual level ct ls-error ct functional

Figure: Value of the functional and the least squares error on the data, i.e.

1 M

M

i=1(f(x i) − yi)2, for the reconstruction of

e−x2 + e−y2 for the combination technique of level n = 0, . . . , 10 with λ = 10−4 (left) and 10−6 (right).

slide-45
SLIDE 45

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Combination Technique in Two Space Case

◮ consider two spaces V1, V2 and their intersection V12 ◮ additive approximation of orthogonal projection into

V = V1 + V2 T a = c1PV1 + c2PV2 + c12PV1∩V2

◮ regression is projection of ˆ

f using scalar product f, gPLS = 1 m

m

  • i=1

(f(xi), g(xi))2 + λSf, Sg2

2 ◮ combination technique is optimal additive

approximation in worst case sense [Hegland.G.Challis:05]

slide-46
SLIDE 46

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Error of Two Space Combination Technique

e2(c1, c2, c12) := PV1+V2u−c1PV1u−c2PV2u−c12PV1∩V2u2, e2

  • = infc1,c2,c12 e2(c1, c2, c12), and e2

c = e2(1, 1, −1)

Theorem (Hegland.G.Challis:05)

e2

c = (1−(cos2 α1−2 cos α1 cos α2 cos γ+cos2 α2))PU1+U2u2

and e2

c = e2

  • + cos2 γ(PU1+U2u2 − e2
  • ).

where U1 := V1 ∩ (V1 ∩ V2)⊥, U2 := V2 ∩ (V1 ∩ V2)⊥, γ = ∠(PU1u, PU2u), α1 = ∠(PU1+U2u, PU1u), α2 = ∠(PU1+U2u, PU2u)

slide-47
SLIDE 47

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Error of Two Space Combination Technique

e2(c1, c2, c12) := PV1+V2u−c1PV1u−c2PV2u−c12PV1∩V2u2, e2

  • = infc1,c2,c12 e2(c1, c2, c12), and e2

c = e2(1, 1, −1)

Theorem (Hegland.G.Challis:05)

e2

c = (1−(cos2 α1−2 cos α1 cos α2 cos γ+cos2 α2))PU1+U2u2

and e2

c = e2

  • + cos2 γ(PU1+U2u2 − e2
  • ).

where U1 := V1 ∩ (V1 ∩ V2)⊥, U2 := V2 ∩ (V1 ∩ V2)⊥, γ = ∠(PU1u, PU2u), α1 = ∠(PU1+U2u, PU1u), α2 = ∠(PU1+U2u, PU2u)

slide-48
SLIDE 48

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Optimised Combination Technique [Hegland:2003]

◮ minimise J(c1, . . . , cm) = Pf − m i=1 ciPVif2 ◮ simple expansion gives

J(c1, . . . , cm) =

m

  • i,j=1

cicj(PVif, PVjf)−2

m

  • i=1

ciPVif2+Pf2.

◮ location of the minimum of J does not depend on Pf ◮ best combination coefficients satisfy

     P1f2 · · · P1f, Pmf P2f, P1f · · · P2f, Pmf . . . ... . . . Pmf, P1f · · · Pmf2           c1 c2 . . . cm      =      P1f2 P2f2 . . . Pmf2     

slide-49
SLIDE 49

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Residual and Angle γ for Example Problem

level γ e2

c

e2

  • 1
  • 0.012924

3.353704 · 10−4 3.351200 · 10−4 2

  • 0.025850

2.124744 · 10−5 2.003528 · 10−5 3

  • 0.021397

8.209228 · 10−6 7.372946 · 10−6 4

  • 0.012931

1.451818 · 10−5 1.421387 · 10−5 5 0.003840 2.873697 · 10−5 2.871036 · 10−5 6 0.032299 5.479755 · 10−5 5.293952 · 10−5 7 0.086570 1.058926 · 10−4 9.284347 · 10−5 8 0.168148 1.882191 · 10−4 1.403320 · 10−4 9 0.237710 2.646455 · 10−4 1.706549 · 10−4 10 0.285065 3.209026 · 10−4 1.870678 · 10−4

Table: Residual for the normal combination technique e2

c and

the optimized combination technique, as well as the angle γ = ∠(PU1u, PU2u).

slide-50
SLIDE 50

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Example: Optimised Combination Technique

0.01 0.1 1 2 4 6 8 10 l2-error and sqrt(functional) level ct l2-error ct sqrt(functional)

  • pticom l2-error
  • pticom sqrt(functional)

1e-09 1e-08 1e-07 1e-06 1e-05 1e-04 0.001 0.01 2 4 6 8 10 least square error and functional level ct ls-error ct functional

  • pticom ls-error
  • pticom functional

◮ reconstruction of e−x2 + e−y2 from 5000 data ◮ λ = 10−2 (left) and 10−6 (right) ◮ value of residual and least squares error on the data ◮ standard

f c

n =

  • l1+l2=n

fl1,l2 −

  • l1+l2=n−1

fl1,l2 and optimised combination technique

slide-51
SLIDE 51

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Comparison with Other Regression Methods

◮ using optimised combination technique ◮ compare with benchmark study in

[Meyer.Leisch.Hornik:2003]

◮ methods: linear regression, ǫ-support vector

regression, neural networks, regression trees, project pursuit regression, multivariate adaptive regression splines (MARS), additive spline models by adaptive backfitting (BRUTO), bagging of trees, random forests, multiple additive regression trees (MART)

◮ procedure

◮ ten-times ten-fold cross-validation for real data ◮ 100 data sets for synthetic data ◮ good λ and level pair on subset of data per 10-fold

CV

slide-52
SLIDE 52

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Benchmark Results

real life data

  • pticom

best other rank # algor. level mean mean mean/med. abalone 1 4.20 nnet 4.31 1/1 9 auto-mpg 2 6.21 svm 7.11 1/1 9 boston-housing 1 8.92 svm 9.60 1/3 9 cpu (×103) 1 1.73 ppr 3.16 1/2 9 cpuSmall 2 8.74 mart 7.55 3/3 10 SLID 3 38.64 rForest 34.13 4/4 9 synthetic data (100.000 for learning)

  • pticom

SVM MARS level MSE time MSE time MSE time Friedman1 3 1.340 3214.4 1.148 23604 1.205 10.4 Friedman1 (5dim) 5 1.040 953.2 Friedman2 (×103) 3 15.46 35.2 15.40 3151 15.77 16.9 Friedman3 (×10−3) 4 13.33 89.5 27.47 16862 14.45 3.6

slide-53
SLIDE 53

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Semi-Supervised Learning

Learning from partially labeled data

◮ set of data points x with label y

SL = {(xi, yi)}L

i=1

xi ∈

d, yi ∈ {−1, 1} or yi ∈

◮ additional set of unlabelled data points x

SU = {xi}U

i=1

xi ∈

d ◮ often unlabelled data much more common than

labelled

◮ chance of better results through use of to be

classified data

◮ natural learning uses unlabelled experiences ◮ one way: use geometric structure of data distribution

slide-54
SLIDE 54

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Simplicity is Relative

◮ picture taken from [Belkin.Niyogi.Sindhwani:2004] ◮ geodesic (intrinsic) distance versus ambient distance ◮ cluster assumption for classification

◮ nearby data points have same class ◮ decision boundary should not cross high density

regions

◮ ideas lead to (sub)manifold learning M ⊂

d

slide-55
SLIDE 55

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Manifold Learning for Dimensionality Reduction

◮ manifold learning for dim. reduction

P : M →

D,

D < d

◮ recent years research on Non-Linear Dimensionality

Reduction (NLDR) under name manifold learning

◮ Isomap, Locally Linear Embedding (LLE), Laplacian

Eigenmap, LTSA, Hessian LLE, SDE

◮ each algorithm learns different mapping and tries to

preserve different geometric signature

◮ steps of the algorithms

  • 1. computing neighbourhood in input space
  • 2. construct matrix based on chosen mapping and

geometry

  • 3. spectral embedding via top or bottom eigenvectors

◮ Isomap, Laplacian Eigenmap and LLE can be

interpreted as kernel methods (kernel PCA)

slide-56
SLIDE 56

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Use Manifold in Semi-Supervised Learning

◮ exploit geometry of marginal distribution, i.e. use

manifold

◮ works on manifold learning in kernel setting

[Belkin.Niyogi.Sindhwani:2004,2005]

[Chapelle.Zien:2005]

◮ we follow [Belkin.Niyogi.Sindhwani:2004,2005] using

Laplacian Eigenmaps

◮ proximity preserving mapping ◮ discrete Laplacian

◮ use variational problem minf∈V R(f) with

R(f) = 1 M

M

  • i=1

(f(xi) − yi)2 + λASf2

L2 + λIf2 I ◮ f2 I regularisation term for intrinsic structure ◮ solution in RKHS

slide-57
SLIDE 57

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Discrete Laplacian

◮ case of compact submanifold M ⊂

d implies

choice f2

I =

  • M

∇Mf, ∇Mf

◮ approx.

  • M∇Mf, ∇Mf from labelled and

unlabelled data

  • M

∇Mf, ∇Mf = 1 (L + U)2

L+U

  • i,j=1
  • f(xi) − f(xj)

2 Wij

◮ Wij edge weights in data adjacency graph (bin., dist.,

heat)

  • M

∇Mf, ∇Mf = λI (L + U)2

L+U

  • i,j=1

f tLf

◮ L graph Laplacian L = D − W,

Dii = L+U

i=1 Wij

slide-58
SLIDE 58

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Discretise Regularisation Problem with Sparse Grids

◮ putting it all together, using S = ∇

R(f) = 1 L

L

  • i=1

(f(xi) − yi)2+λA∇f2

2+

λI (L + U)2

L+U

  • i,j=1

f tLf

◮ with f = (f(x1), . . . , f(x L), f(xL+1), . . . , f(x L+U))t ◮ again minimise in discrete space VN,

fN = N

j=1 αjϕj(x) ◮ set ∂αR = 0, results in N 1 2N linear equation system

(BtB + λAL · C + λIL (L + U)2 BtLB)α = Bty

slide-59
SLIDE 59

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Complexities

◮ kernel-based approaches currently scale cubic in

L + U

◮ equation for one entry of BtLB

(BtLB)k,l =

L+U

  • i,j

ϕk(xi) · ϕl(xj) · Li,j

◮ full adjacency graph → (L + U) · (#{ϕk(xi) = 0})2 for

xi

◮ total complexity quadratic in L + U ◮ k-NN adjacency graph → k · (#{ϕk(xi) = 0})2 for xi ◮ total complexity linear in L + U ◮ iterative solution of equation system independent of

L + U

slide-60
SLIDE 60

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Two-Moons Dataset

λA = 0.01 λA = 0.01 λA = 0.01 γI = 0 γI = 0.1 γI = 0.5

◮ γI := λIL (L+U)2 ◮ computation on level 8

slide-61
SLIDE 61

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Data-Mining-Cup 2000

◮ 10,000 training data, 34,820 to be classified, 40

attributes

◮ maximise profit of direct-mailing campaign

◮ choose top cut-off % of sorted data for mailing

campaign

◮ 185 DM for responder, -12 DM for non-responder

◮ winner of DM-Cup: 67,038 DM ◮ Data-Mining Specialists: 84,995 DM (external

knowledge)

◮ sparse grid classification (PCA 18 attributes): 87,705

DM

◮ use to be classified data in learning process ◮ split 60:40 of training, fit parameters λA, γI, cut-off ◮ sparse grid for semi-supervised learning: 93,812 DM ◮ finding good parameters more difficult

slide-62
SLIDE 62

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Data-Mining-Cup 2000

◮ 10,000 training data, 34,820 to be classified, 40

attributes

◮ maximise profit of direct-mailing campaign

◮ choose top cut-off % of sorted data for mailing

campaign

◮ 185 DM for responder, -12 DM for non-responder

◮ winner of DM-Cup: 67,038 DM ◮ Data-Mining Specialists: 84,995 DM (external

knowledge)

◮ sparse grid classification (PCA 18 attributes): 87,705

DM

◮ use to be classified data in learning process ◮ split 60:40 of training, fit parameters λA, γI, cut-off ◮ sparse grid for semi-supervised learning: 93,812 DM ◮ finding good parameters more difficult

slide-63
SLIDE 63

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Data-Mining-Cup 2000

◮ 10,000 training data, 34,820 to be classified, 40

attributes

◮ maximise profit of direct-mailing campaign

◮ choose top cut-off % of sorted data for mailing

campaign

◮ 185 DM for responder, -12 DM for non-responder

◮ winner of DM-Cup: 67,038 DM ◮ Data-Mining Specialists: 84,995 DM (external

knowledge)

◮ sparse grid classification (PCA 18 attributes): 87,705

DM

◮ use to be classified data in learning process ◮ split 60:40 of training, fit parameters λA, γI, cut-off ◮ sparse grid for semi-supervised learning: 93,812 DM ◮ finding good parameters more difficult

slide-64
SLIDE 64

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Dimension Adaptive Combination Technique

Remaining problems

◮ still high dependence on the dimension ◮ classical construction too static ◮ spatial adaptivity in high dimensions impracticable ◮ however: dimension importance/interaction often

varies Ansatz

◮ start hierarchy with constant ◮ use lower degree formulas in less important

dimensions

◮ large reduction in complexity if important dimensions

are few (small effective dimension)

◮ reduction in complexity if interaction between

dimensions is small (Friedmann ’91: Not more than 5 for real data ?)

◮ dimension adaptive construction of generalised

sparse grid

slide-65
SLIDE 65

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Hierarchy with Constant Functions

ˆ W−1,−1 ˆ W0,−1 ˆ W1,−1 ˆ W2,−1 ˆ W3,−1 ˆ W−1,3 ˆ W−1,2 ˆ W−1,1 ˆ W−1,0 ˆ W0,3 W1,3 W2,3 W3,3 ˆ W0,2 W1,2 W2,2 W3,2 ˆ W0,1 W1,1 W2,1 W3,1 ˆ W0,0 ˆ W1,0 ˆ W2,0 ˆ W3,0

slide-66
SLIDE 66

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Refinement Procedure

◮ start with coarsest grid (index) ◮ successively add indices ◮ compute solution on corresponding grid

Hegland ’01, Gerstner, Griebel ’03

slide-67
SLIDE 67

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Dimension Adaptive Algorithm

◮ Ensure that in each step index set remains

admissible

◮ Need robust and efficient criteria for adaptive choice

  • f grids

◮ currently computing residual ◮ need optimised combination technique, otherwise

decrease not guaranteed

◮ Active indices: indices currently not in index set, but

which can be added in next step

◮ f (c) I

(x) :=

k∈I

(1,...,1)

z=(0,...,0)(−1)|z|1 · χI(k + z)

  • fk(x).
slide-68
SLIDE 68

MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?

Sparse Grids in Reinforcement Learning ?

◮ based on work by [Munos:2000,Munos.Moore:2002] ◮ Hamilton-Jacobi-Bellmann-equation describes this

setting f π(s) ln γ + ∇f π(s) · g(s, π(s)) + r(s, π(s)) = 0.

◮ f π(s) is the state-value function, it assigns the

expected cumulated reward to each state s

◮ discount rate γ weighs influence of future rewards ◮ to find optimal state-value function and optimal policy

we need to solve this differential equation

◮ s(t) is the state, a(t) the action and g is called state

dynamics

◮ r(s, a) is the reward function ◮ consider a deterministic policy π(s), which assigns

each state in t a unique action, i.e., a(t) = π(s(t))