MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids - - PowerPoint PPT Presentation
MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids - - PowerPoint PPT Presentation
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination Technique Jochen Garcke Regression / Classification via Function
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Outline
(Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Partial Differential Equations
◮ Poisson equation (model problem)
electric potential u for a given charge f −△u(= −∇2u) = f
◮ Navier-Stokes-equation describes motion of fluid
substances like liquids and gases ∂u ∂t + u∇u − 1 Re△u + ∇p = f ∇ · u = 0
◮ Schrödinger-equation (quantum chemistry)
eigenvalue problem Hψ = λψ with H = −
N
- i
2 2m∆i −
K
- α
2 2Mα ∆α −
N,K
- i,α
eZα rαi +
N
- i<j
e2 rij +
K
- α<β
ZαZβ Rαβ .
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Galerkin-Variational Principle
◮ minimise J(v) = 1 2a(v, v) − f, v with v ∈ V ◮ simplified think as:
Lu = f → a(u, v) =
- Lu · v = f, v
◮ for model problem −△u →
- ∇u∇u
◮ minimum u of J equivalent to find u ∈ V which
satisfies a(u, v) = f, v ∀v ∈ V
◮ Lax-Milgram-Lemma: V Hilbertspace, f bounded and
a is bounded (|a(u, v)| ≤ CuV vv) and V-elliptic (CEu2
V ≤ a(u, u)
∀u) exists unique solution u
◮ weak solution to original partial differential equation
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Discretisation
◮ discretise: VN ⊂ V, VN finite-dimensional ◮ find uN ∈ VN which satisfies
a(uN, vN) = f(vN) ∀vN ∈ VN
◮ Cea-Lemma: a is V-elliptic, u, uN solutions in V, VN,
respectively, it then holds u − uNV ≤ C inf
vn∈VN
u − vNV
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Example for VN in One Dimension
◮ one-dimensional basis for level 3 ◮ interpolation of parabola φ3,1 φ3,5 φ3,3 φ3,2 φ3,4 φ3,6 φ3,7
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
One-dimensional Basis Functions
◮ one-dimensional basis functions φl,j(x) with support
[xl,j − hl, xl,j + hl] ∩ [0, 1] = [(j − 1)hl, (j + 1)hl] ∩ [0, 1] are defined by: φl,j(x) = 1 − |x/hl − j|, x ∈ [(j − 1)hl, (j + 1)hl] ∩ [0, 1]; 0,
- therwise.
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Basis Functions in More Dimensions
◮ d-dimensional piecewise d-linear hat functions
φl,j(x) :=
d
- t=1
φlt,jt(xt).
◮ associated function space Vl of piecewise d-linear
functions Vl := span{φl,j | jt = 0, . . . , 2lt, t = 1, . . . , d}
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Some Notation
◮ simplification: domain ¯
Ω := [0, 1]d
◮ l = (l1, . . . , ld) ∈
- d denotes a multi-index
◮ define mesh size hl := (2−l1, . . . , 2−ld) ◮ anisotropic grid Ωl on ¯
Ω
◮ different, but equidistant mesh sizes ◮ Ωl consists of the points
xl,j := (xl1,j1, . . . , xld,jd), with xlt,jt := jt · hlt = jt · 2−lt and jt = 0, . . . , 2lt Ω3,1
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Triangulation Instead of Tensor Product
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Approximation Properties
◮ Dαu = ∂|α| ∂x
α1 1
···∂x
αd d u
◮ Sobolev spaces Hs with norm
u2
Hs =
- |α|≤s
(Dαu)2
◮ a V-elliptic, VN piecewise (bi)linear, and u ∈ H2
u − uNH1 ≤ Ch|u|H2
◮ error in L2
u − uNL2 ≤ Ch2|u|H2
◮ above results are in two dimensions, similar results in
higher dimensions
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Interpolation with Hierarchical Basis
φ3,1 φ3,5 φ3,3 φ3,2 φ3,4 φ3,6 φ3,7
nodal basis V1 ⊂ V2 ⊂ V3
φ3,7 φ3,1 φ3,3 φ2,1 φ1,1 φ3,5 φ2,3
hierarchical basis V3 = W3 W2 V1
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Difference Spaces
◮ l ∈
- d denotes the level, i.e. the discretisation
resolution, of a grid Ωl, a space Vl or a function fl
◮ j ∈
- d gives the position of a grid point xl,j or the
corresponding basis function φl,j(·)
◮ hierarchical difference space Wl via
Wl := Vl \
d
- t=1
Vl−et, (1) where et is the t-th unit vector
◮ In other words, Wl consists of all φk,j ∈ Vl which are
not included in any of the spaces Vk smaller than Vl
◮ to complete the definition, we formally set Vl := 0, if
lt = −1 for at least one t ∈ {1, . . . , d}
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Tensor Product Decomposition
◮ the index set
Bl :=
- j ∈
- d
- jt = 1, . . . , 2lt − 1,
jt odd, t = 1, . . . , d, if lt > 0, jt = 0, 2lt, t = 1, . . . , d, if lt = 0
◮ leads to
Wl = span{φl,j, j ∈ Bl}
◮ hierarchical difference spaces now allow us the
definition of a multilevel subspace decomposition
◮ we can write Vn := Vn as a direct sum of subspaces
Vn :=
n
- l1=0
· · ·
n
- ld=0
Wl =
- |l|∞≤n
Wl.
◮ |l|∞ := max1≤t≤d lt and |l|1 := d t=1 lt are the
discrete L∞- and the discrete L1-norm of l, respectively
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Subspaces Wl for V3,3
W0,3 W1,3 W2,3 W3,3 W0,2 W1,2 W2,2 W3,2 W0,1 W1,1 W2,1 W3,1 W0,0 W1,0 W2,0 W3,0
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Basis [Faber:09,Yserentant:86]
◮
{φl,j, j ∈ Bl}n
l=0 ◮ generalises the one-dimensional hierarchical basis
the d-dimensional case with a tensor product ansatz
◮ supports of φl,j(x) which span Wl are disjunct ◮ function f ∈ Vn can be represented as
f(x) =
- |l|∞≤n
- j∈Bl
αl,j · φl,j(x),
◮ number of basis functions, which describe a f ∈ Vn in
nodal or hierarchical basis is (2n + 1)d = O(h−d
n ) ◮ curse of dimensionality:
n=6, d=6 results in 75 418 890 625 points
◮ Vl the following decomposition holds
Vl :=
l1
- k1=0
· · ·
ld
- kd=0
Wk =
- k≤l
Wk.
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Interpolation with Hierarchical Basis
◮ consider d-linear interpolation of f ∈ V by a fn ∈ Vn ◮ linear interpolation in one dimension, for the
hierarchical coefficients αl,j holds αl,j = f(xl,j)−f(xl,j − h) + f(xl,j + h) 2 = f(xl,j)−f(xl,j−1) + f(xl,j+1) 2 .
◮ illustrates why αl,j are also called hierarchical surplus ◮ they specify what has to be added to hierarchical
representation of level l −1 to obtain the one of level l
◮ rewrite this in the following operator form
αl,j =
- −1
2 1 − 1 2
- l,j
f
◮ generalise to the d-dimensional case as follows
αl,j = d
- i=1
- −1
2 1 − 1 2
- li,ji
- f
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Sobolev-Space H2
mix with Domin. Mixed Deriv.
◮ define the norm as
f2
Hs
mix =
- 0≤k≤s
- ∂|k|1
∂xk f
- 2
2
,
◮ space H2 mix in the usual way:
Hs
mix := {f : Ω →
✁: f2
Hs
mix < ∞}
◮ furthermore we define the semi-norm |f|H2
mix
|f|Hk
mix :=
- ∂|k|1
∂xk f
- 2
2
,
◮ continuous function spaces Hs mix, like the discrete
spaces Vl, have tensor product structure [Wahba:1990,Griebel.Knapek:2000] Hs
mix = Hs ⊗ · · · ⊗ Hs.
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Values αl,j are Bounded I
we assume f ∈ H2
0,mix(¯
Ω), i.e. zero boundary values, and l > 0 to avoid the special treatment of level 0, i.e. the boundary functions in the hierarchical representation. Straightforward calculation shows
Lemma
For any piecewise d-linear basis function φl,j holds φl,j2 ≤ C(d) · 2−|l|1/2
Lemma
For any hierarchical coefficient of f ∈ H 2
0,mix(¯
Ω) it holds αl,j =
d
- i=1
−hi 2
- Ω
φl,j · D2f(x)dx
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Values αl,j are Bounded II
Proof.
in one dimension partial integration provides
- Ω
φl,j · ∂2f(x)dx ∂x2 = xl,j+h
xl,j−h
φl,j · ∂2f(x)dx ∂x2 =
- φl,j · ∂f(x)dx
∂x xl,j+h
xl,j−h
− xl,j+h
xl,j−h
∂φl,j ∂x · ∂f(x)dx ∂x = − xl,j
xl,j−h
1 h · ∂f(x)dx ∂x + xl,j+h
xl,j
1 h · ∂f(x)dx ∂x = 1 h ·
- f(xl,j − h) − 2f(xl,j) + f(xl,j + h)
- =
−2 h · αl,j d-dimensional result is achieved via the tensor product formulation
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Values αl,j are Bounded III
Lemma
f ∈ H2
0,mix(¯
Ω) as above in hierarchical representation |αl,j| ≤ C(d) · 2−(3/2)·|l|1 ·
- f|supp(φl,j)
- H2
mix
Proof.
|αl,j| =
- d
- i=1
−hi 2
- Ω
φl,j · D2f(x)dx
- ≤
d
- i=1
hi 2 · φl,j2 ·
- D2f|supp(φl,j)
- 2
≤ C(d) · 2−(3/2)·|l|1 ·
- f|supp(φl,j)
- H2
mix
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
- Hier. Compon. Bounded by Size of Support
Lemma
f ∈ H2
0,mix(¯
Ω) as above in hierarchical representation. For its components fl ∈ Wl holds fl2 ≤ C(d) · 2−2·|l|1 · |f|H2
mix
Proof.
Since the supports of all φl,j are mutually disjoint fl2
2 =
- j∈Bl
αl,j · φl,j(x)
- 2
2
=
- j∈Bl
|αl,j|2 · φl,j2
2
With Lemma 3 and 1 it now follows fl2
2
≤
- j∈Bl
C(d) · 2−3·|l|1 ·
- f|supp(φl,j)
- 2
H2
mix
· C(d) · 2−|l|1
4 l 2
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Subspaces Wl
W0,3 W1,3 W2,3 W3,3 W0,2 W1,2 W2,2 W3,2 W0,1 W1,1 W2,1 W3,1 W0,0 W1,0 W2,0 W3,0
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchical Subspaces Wl
W0,3 W1,3 W2,3 W3,3 W0,2 W1,2 W2,2 W3,2 W0,1 W1,1 W2,1 W3,1 W0,0 W1,0 W2,0 W3,0
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Sparse Grids
◮ we define the sparse grid function space V s n ⊂ Vn as
V s
n :=
- |l|1≤n
Wl.
◮ every f ∈ V s n can now be represented as
f s
n (x) =
- |l|1≤n
- j∈Bl
αl,jφl,j(x).
◮ approximation property in H2 mix
||f − f s
n ||2 = O(h2 n log(h−1 n )d−1) ◮ sparse grid needs O(h−1 n (log(h−1 n ))d−1) points ◮ here n=6, d=6 results in 483 201 points
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Sparse Grids in two and three dimensions
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
History of Sparse Grids
◮ 1960 Babenko "hyperbolic crosses" ◮ 1963 Smolyak ◮ 1971 Gordon
1992 Baszenski, Delvos, Jester "discrete blending methods"
◮ 1981 Delvos "boolean interpolation" ◮ 1990 Zenger, Griebel "sparse grids"
Overview article: Bungartz and Griebel in Acta Numerica 2004 Tutorial on Sparse Grids on my webpage wwwmaths.anu.edu.au/~garcke/
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Some Recent Applications of Sparse Grids
◮ numerical integration (Smolyak ’63, Bonk ’94, Novak,
Ritter ’96, Gerstner, Griebel ’98/’03)
◮ eigenvalues of Schrödinger equation in 6D (G.,
Griebel ’00)
◮ fluid dynamics (Griebel, Thurner’ 93, Koster’ 02) ◮ numerical integration in finance (option pricing)
(Gerstner, Griebel, Wahl ’01)
◮ stochastic differential equations (Schwab, Todor ’03) ◮ Maxwell equation (Gradinaru, Hiptmair ’03) ◮ parabolic equations for option pricing (Reisinger ’04) ◮ wavelet-based sparse grid for parabolic problems
(Petersdorff.Schwab ’04)
◮ classification, regression, data analysis (G., Griebel,
Thess ’01, Hegland ’03, Bendel.Kahrs.Marquardt ’05)
◮ Fokker-Planck-equation (Feuersänger.Griebel ’05) ◮ Gene Regulatory Networks (Hegland et. al. ’06)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Simple Example in Numerical Integration 10D
Consider smooth integrand
- ✂
10 e−xT x+xT bdx
5 digits
◮ PR: 317 y ◮ MC: 100 s ◮ QM: 1 s ◮ SG: 0.02 s
(Gerstner, Griebel ’98)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
How to Compute on a Sparse Grid
◮ computation on sparse grid in hierarchical basis ◮ function f s n ∈ V s n is split
f s
n (x) =
- |l|1≤n
ˆ fl with ˆ fl ∈ Wl
◮ example V s 2 = W20 ⊕ W11 ⊕ W02 ⊕ W10 ⊕ W01 ⊕ W00 ◮ recall difference-spaces Wl
Wl := Vl \
d
- t=1
Vl−et
◮ observe
W11 = V11 \ (V10 ∪ V01) = V11 − V10 − V01 + V00
◮ telescope sum property gives f s n as sum of fl ∈ Vl ◮ combination technique is based on this property
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Combination Technique of Level 4 in 2d
Ω4,0
⊕
Ω3,1
⊕
Ω2,2
⊕
Ω1,3
⊕
Ω0,4
⊖
Ω3,0
⊖
Ω2,1
⊖
Ω1,2
⊖
Ω0,3
=
Ωs
4
f c
n =
X
l1+l2=n+1
fl1,l2 − X
l1+l2=n
fl1,l2
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Telescope Sum Property for Interpolation
We write it exemplary in the two dimensional case, using ˆ fl1+l2 ∈ Wl1,l2 instead of all the basis functions of Wl1,l2 for ease of presentation f c
n
=
- l1+l2=n
fl1,l2 −
- l1+l2=n−1
fl1,l2 =
- l1≤n
- k1≤l1
- k2≤n−l1
ˆ fk1,k2 −
- l1≤n−1
- k1≤l1
- k2≤n−1−l1
ˆ fk1,k2 =
- k1≤l1
ˆ fk1,1 +
- l1≤n−1
- k1≤l1
k2≤n−l1
ˆ fk1,k2 −
- k2≤n−1−l1
ˆ fk1,k2 =
- k1≤l1
ˆ fk1,1 +
- l1≤n−1
- k1≤l1
ˆ fk1,n−l1 =
- k1+t≤n
ˆ fk1,t with t := n + 1 − l1.
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Sparse Grid Combination Technique
◮ solve problem on sequence of full grids Ωl
l1 + ... + ld = n − q, q = 0, .., d − 1, lt ≥ 0
◮ combine results fl(x) ∈ Vl for solution on sparse grid
Ω(s)
n
f (c)
n (x) := d−1
- q=0
(−1)q d − 1 q
- l1+...+ld=n−q
fl(x)
◮ number of spaces for V c n = O(dnd−1) ◮ dim(Vl) = O(2d−1 · h−1 n ) ◮ practical limit d ≈ 20 due to memory constraints ◮ interpolation: combinat. technique is sparse grid
solution
◮ for PDE problems of same approximation order (as
long as certain error expansion holds)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Generalised Combination Technique
◮ generalised combination formula
f (c)
I
(x) :=
- k∈I
(1,...,1)
- z=(0,...,0)
(−1)|z|1 · χI(k + z) fk(x).
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Summary Sparse Grids
◮ sparse grids allow the representation of functions in
high dimensions (up to about 20)
◮ controlled approximation order to the right solution
(under certain assumption on the smoothness)
◮ efficient (multigrid) solvers often exist O(N) where N
is number of points
◮ used for numerical solution of partial differential
equations and numerical integration
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Problem Setting for Regression / Classification
◮ given set of data points x with label y
S = {(xi, yi)}M
i=1
xi ∈
✁d, yi ∈ {−1, 1} or yi ∈
✁◮ we want to reconstruct underlying function f, the
classifier
◮ ill-posed problem → use regularisation for
well-posedness
◮ compromise between approximation and
generalisation
◮ approximate given training data set S (data error) ◮ ’good’ results on new, unseen data
◮ d can be large ◮ S can consist of millions to billions of data points ◮ wide applications in business and research
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Regularisation Theory
◮ consider variational problem minf∈V R(f) with
R(f) = 1 M
M
- i=1
(f(xi) − yi)2 + λSf2
◮ error of classifier on given data 1
M
M
i=1 (f(xi) − yi)2
◮ regularisation (or stabilisation) operator S ◮ regularisation parameter λ
◮ representer theorem in RKHS
[Kimeldorf.Wahba:1971] f(x) = M
i=1 αik(x, xi) ◮ linear superposition of kernel functions ◮ corresponding Galerkin equations are
1 m
m
- i=1
f(xi)g(xi) + λSf, Sg2 = 1 m
m
- i=1
g(xi)yi, which hold for the minimum f ∈ V of R(f) and all g ∈ V
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Discretisation
◮ restrict to finite dimensional subspace VN ∈ V ◮ regularisation by discretisation/projection [Natterer
’77]
◮ minimise in VN, with fN = N j=1 αjϕj(x), use S := ∇,
Rreg(fN) = 1 M
M
- i=1
(fN(xi) − yi)2 + λ∇fN2
L2 ◮ with ∂αRreg = 0 follows (k = 1, . . . , N) N
- j=1
αj
- Mλ(∇ϕj, ∇ϕk)L2 +
M
- i=1
ϕj(xi) · ϕk(xi)
- =
M
- i=1
yiϕk(xi)
◮ results in linear equation system
(λ · MC + Bt · B)α = Bty
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Complexities
◮ solve on each grid:
(λ · MCl + Bt
l · Bl)αl = Bt l y
(Bt · B)j,k = M
i ϕj(xi) · ϕk(xi)
d-linear basis functions linear basis on simplices Cl Bl · Bt
l
Cl Bl · Bt
l
assemb. O(3d · N) O(22d · M) O(2 · d · N) O(d 2 · M) storage O(3d · N) O(3d · N) O(2 · d · N) O(2d · N)
(1,1,1) (0,0,0)
◮ scales linearly with M, the number of data points
(N ≪ M)
◮ properties of combination technique using simplices
◮ non nestedness of discrete spaces ◮ error expansion not in regard to anisotropic mesh
size h
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Coarse and Fine Grain Parallelisation
◮ partial functions on each grid Ωl in the collection of
grids can be computed independently of each other
◮ computation of BT · B depends on xi. Parallelise loop
- ver xi.
(BT · B)j,k =
M
- i
ϕj(x(i)) · ϕk(x(i))
◮ parallelisation of the matrix-vector-multiplication
through splitting of vector in p-parts of size n/p
◮ combination of coarse and fine grain parallelisation
possible
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Ripley Data Set
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
ndcHard (synthetic) 10 D with 2 Million Data
training testing total data matrix # of points % % time (sec) time (sec) 20 000 86.2 84.2 6.3 0.9 level 1 200 000 85.1 84.8 16.2 8.7 2 million 84.9 84.9 114.9 84.9 20 000 85.1 83.8 134.6 10.3 level 2 200 000 84.5 84.2 252.3 98.2 2 million 84.3 84.2 1332.2 966.6
◮ linear SVM 69.8 % / 69.5 % [Fung.Mangasarian:01] ◮ no results with non-linear SVM (complexity
constraints)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Comparison with Other Methods
data set dim sparse grid %
- ther %
rank total breast cancer 9 97.66 1.4 97.72 2 25 pima diabetis 8 76.80 1.9 77.63 5 28 heart 13 80.85 2.4 86.90 18 25 thyroid 5 96.07 2.7 95.60 1 11 titanic 3 77.94 0.7 78.96 8 19 flare-solar 9 66.85 1.8 67.57 2 7
◮ compare with results in (mainly) Meyer, Leisch,
Hornik ’03 and some others
◮ methods include support vector machines,
boosting-methods like AdaBoost, linear discriminant analysis, neural networks, nearest neighbor, random forests and multivariate adaptive regression splines
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Combination Technique Can Diverge
1e-07 1e-06 1e-05 1e-04 0.001 0.01 2 4 6 8 10 ls-error and residual level ct ls-error ct functional 1e-05 1e-04 0.001 0.01 2 4 6 8 10 ls-error and residual level ct ls-error ct functional
Figure: Value of the functional and the least squares error on the data, i.e.
1 M
M
i=1(f(x i) − yi)2, for the reconstruction of
e−x2 + e−y2 for the combination technique of level n = 0, . . . , 10 with λ = 10−4 (left) and 10−6 (right).
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Combination Technique in Two Space Case
◮ consider two spaces V1, V2 and their intersection V12 ◮ additive approximation of orthogonal projection into
V = V1 + V2 T a = c1PV1 + c2PV2 + c12PV1∩V2
◮ regression is projection of ˆ
f using scalar product f, gPLS = 1 m
m
- i=1
(f(xi), g(xi))2 + λSf, Sg2
2 ◮ combination technique is optimal additive
approximation in worst case sense [Hegland.G.Challis:05]
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Error of Two Space Combination Technique
e2(c1, c2, c12) := PV1+V2u−c1PV1u−c2PV2u−c12PV1∩V2u2, e2
- = infc1,c2,c12 e2(c1, c2, c12), and e2
c = e2(1, 1, −1)
Theorem (Hegland.G.Challis:05)
e2
c = (1−(cos2 α1−2 cos α1 cos α2 cos γ+cos2 α2))PU1+U2u2
and e2
c = e2
- + cos2 γ(PU1+U2u2 − e2
- ).
where U1 := V1 ∩ (V1 ∩ V2)⊥, U2 := V2 ∩ (V1 ∩ V2)⊥, γ = ∠(PU1u, PU2u), α1 = ∠(PU1+U2u, PU1u), α2 = ∠(PU1+U2u, PU2u)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Error of Two Space Combination Technique
e2(c1, c2, c12) := PV1+V2u−c1PV1u−c2PV2u−c12PV1∩V2u2, e2
- = infc1,c2,c12 e2(c1, c2, c12), and e2
c = e2(1, 1, −1)
Theorem (Hegland.G.Challis:05)
e2
c = (1−(cos2 α1−2 cos α1 cos α2 cos γ+cos2 α2))PU1+U2u2
and e2
c = e2
- + cos2 γ(PU1+U2u2 − e2
- ).
where U1 := V1 ∩ (V1 ∩ V2)⊥, U2 := V2 ∩ (V1 ∩ V2)⊥, γ = ∠(PU1u, PU2u), α1 = ∠(PU1+U2u, PU1u), α2 = ∠(PU1+U2u, PU2u)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Optimised Combination Technique [Hegland:2003]
◮ minimise J(c1, . . . , cm) = Pf − m i=1 ciPVif2 ◮ simple expansion gives
J(c1, . . . , cm) =
m
- i,j=1
cicj(PVif, PVjf)−2
m
- i=1
ciPVif2+Pf2.
◮ location of the minimum of J does not depend on Pf ◮ best combination coefficients satisfy
P1f2 · · · P1f, Pmf P2f, P1f · · · P2f, Pmf . . . ... . . . Pmf, P1f · · · Pmf2 c1 c2 . . . cm = P1f2 P2f2 . . . Pmf2
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Residual and Angle γ for Example Problem
level γ e2
c
e2
- 1
- 0.012924
3.353704 · 10−4 3.351200 · 10−4 2
- 0.025850
2.124744 · 10−5 2.003528 · 10−5 3
- 0.021397
8.209228 · 10−6 7.372946 · 10−6 4
- 0.012931
1.451818 · 10−5 1.421387 · 10−5 5 0.003840 2.873697 · 10−5 2.871036 · 10−5 6 0.032299 5.479755 · 10−5 5.293952 · 10−5 7 0.086570 1.058926 · 10−4 9.284347 · 10−5 8 0.168148 1.882191 · 10−4 1.403320 · 10−4 9 0.237710 2.646455 · 10−4 1.706549 · 10−4 10 0.285065 3.209026 · 10−4 1.870678 · 10−4
Table: Residual for the normal combination technique e2
c and
the optimized combination technique, as well as the angle γ = ∠(PU1u, PU2u).
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Example: Optimised Combination Technique
0.01 0.1 1 2 4 6 8 10 l2-error and sqrt(functional) level ct l2-error ct sqrt(functional)
- pticom l2-error
- pticom sqrt(functional)
1e-09 1e-08 1e-07 1e-06 1e-05 1e-04 0.001 0.01 2 4 6 8 10 least square error and functional level ct ls-error ct functional
- pticom ls-error
- pticom functional
◮ reconstruction of e−x2 + e−y2 from 5000 data ◮ λ = 10−2 (left) and 10−6 (right) ◮ value of residual and least squares error on the data ◮ standard
f c
n =
- l1+l2=n
fl1,l2 −
- l1+l2=n−1
fl1,l2 and optimised combination technique
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Comparison with Other Regression Methods
◮ using optimised combination technique ◮ compare with benchmark study in
[Meyer.Leisch.Hornik:2003]
◮ methods: linear regression, ǫ-support vector
regression, neural networks, regression trees, project pursuit regression, multivariate adaptive regression splines (MARS), additive spline models by adaptive backfitting (BRUTO), bagging of trees, random forests, multiple additive regression trees (MART)
◮ procedure
◮ ten-times ten-fold cross-validation for real data ◮ 100 data sets for synthetic data ◮ good λ and level pair on subset of data per 10-fold
CV
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Benchmark Results
real life data
- pticom
best other rank # algor. level mean mean mean/med. abalone 1 4.20 nnet 4.31 1/1 9 auto-mpg 2 6.21 svm 7.11 1/1 9 boston-housing 1 8.92 svm 9.60 1/3 9 cpu (×103) 1 1.73 ppr 3.16 1/2 9 cpuSmall 2 8.74 mart 7.55 3/3 10 SLID 3 38.64 rForest 34.13 4/4 9 synthetic data (100.000 for learning)
- pticom
SVM MARS level MSE time MSE time MSE time Friedman1 3 1.340 3214.4 1.148 23604 1.205 10.4 Friedman1 (5dim) 5 1.040 953.2 Friedman2 (×103) 3 15.46 35.2 15.40 3151 15.77 16.9 Friedman3 (×10−3) 4 13.33 89.5 27.47 16862 14.45 3.6
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Semi-Supervised Learning
Learning from partially labeled data
◮ set of data points x with label y
SL = {(xi, yi)}L
i=1
xi ∈
✁d, yi ∈ {−1, 1} or yi ∈
✁◮ additional set of unlabelled data points x
SU = {xi}U
i=1
xi ∈
✁d ◮ often unlabelled data much more common than
labelled
◮ chance of better results through use of to be
classified data
◮ natural learning uses unlabelled experiences ◮ one way: use geometric structure of data distribution
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Simplicity is Relative
◮ picture taken from [Belkin.Niyogi.Sindhwani:2004] ◮ geodesic (intrinsic) distance versus ambient distance ◮ cluster assumption for classification
◮ nearby data points have same class ◮ decision boundary should not cross high density
regions
◮ ideas lead to (sub)manifold learning M ⊂
✁d
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Manifold Learning for Dimensionality Reduction
◮ manifold learning for dim. reduction
P : M →
✁D,
D < d
◮ recent years research on Non-Linear Dimensionality
Reduction (NLDR) under name manifold learning
◮ Isomap, Locally Linear Embedding (LLE), Laplacian
Eigenmap, LTSA, Hessian LLE, SDE
◮ each algorithm learns different mapping and tries to
preserve different geometric signature
◮ steps of the algorithms
- 1. computing neighbourhood in input space
- 2. construct matrix based on chosen mapping and
geometry
- 3. spectral embedding via top or bottom eigenvectors
◮ Isomap, Laplacian Eigenmap and LLE can be
interpreted as kernel methods (kernel PCA)
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Use Manifold in Semi-Supervised Learning
◮ exploit geometry of marginal distribution, i.e. use
manifold
◮ works on manifold learning in kernel setting
◮
[Belkin.Niyogi.Sindhwani:2004,2005]
◮
[Chapelle.Zien:2005]
◮ we follow [Belkin.Niyogi.Sindhwani:2004,2005] using
Laplacian Eigenmaps
◮ proximity preserving mapping ◮ discrete Laplacian
◮ use variational problem minf∈V R(f) with
R(f) = 1 M
M
- i=1
(f(xi) − yi)2 + λASf2
L2 + λIf2 I ◮ f2 I regularisation term for intrinsic structure ◮ solution in RKHS
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Discrete Laplacian
◮ case of compact submanifold M ⊂
✁d implies
choice f2
I =
- M
∇Mf, ∇Mf
◮ approx.
- M∇Mf, ∇Mf from labelled and
unlabelled data
- M
∇Mf, ∇Mf = 1 (L + U)2
L+U
- i,j=1
- f(xi) − f(xj)
2 Wij
◮ Wij edge weights in data adjacency graph (bin., dist.,
heat)
- M
∇Mf, ∇Mf = λI (L + U)2
L+U
- i,j=1
f tLf
◮ L graph Laplacian L = D − W,
Dii = L+U
i=1 Wij
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Discretise Regularisation Problem with Sparse Grids
◮ putting it all together, using S = ∇
R(f) = 1 L
L
- i=1
(f(xi) − yi)2+λA∇f2
2+
λI (L + U)2
L+U
- i,j=1
f tLf
◮ with f = (f(x1), . . . , f(x L), f(xL+1), . . . , f(x L+U))t ◮ again minimise in discrete space VN,
fN = N
j=1 αjϕj(x) ◮ set ∂αR = 0, results in N 1 2N linear equation system
(BtB + λAL · C + λIL (L + U)2 BtLB)α = Bty
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Complexities
◮ kernel-based approaches currently scale cubic in
L + U
◮ equation for one entry of BtLB
(BtLB)k,l =
L+U
- i,j
ϕk(xi) · ϕl(xj) · Li,j
◮ full adjacency graph → (L + U) · (#{ϕk(xi) = 0})2 for
xi
◮ total complexity quadratic in L + U ◮ k-NN adjacency graph → k · (#{ϕk(xi) = 0})2 for xi ◮ total complexity linear in L + U ◮ iterative solution of equation system independent of
L + U
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Two-Moons Dataset
λA = 0.01 λA = 0.01 λA = 0.01 γI = 0 γI = 0.1 γI = 0.5
◮ γI := λIL (L+U)2 ◮ computation on level 8
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Data-Mining-Cup 2000
◮ 10,000 training data, 34,820 to be classified, 40
attributes
◮ maximise profit of direct-mailing campaign
◮ choose top cut-off % of sorted data for mailing
campaign
◮ 185 DM for responder, -12 DM for non-responder
◮ winner of DM-Cup: 67,038 DM ◮ Data-Mining Specialists: 84,995 DM (external
knowledge)
◮ sparse grid classification (PCA 18 attributes): 87,705
DM
◮ use to be classified data in learning process ◮ split 60:40 of training, fit parameters λA, γI, cut-off ◮ sparse grid for semi-supervised learning: 93,812 DM ◮ finding good parameters more difficult
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Data-Mining-Cup 2000
◮ 10,000 training data, 34,820 to be classified, 40
attributes
◮ maximise profit of direct-mailing campaign
◮ choose top cut-off % of sorted data for mailing
campaign
◮ 185 DM for responder, -12 DM for non-responder
◮ winner of DM-Cup: 67,038 DM ◮ Data-Mining Specialists: 84,995 DM (external
knowledge)
◮ sparse grid classification (PCA 18 attributes): 87,705
DM
◮ use to be classified data in learning process ◮ split 60:40 of training, fit parameters λA, γI, cut-off ◮ sparse grid for semi-supervised learning: 93,812 DM ◮ finding good parameters more difficult
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Data-Mining-Cup 2000
◮ 10,000 training data, 34,820 to be classified, 40
attributes
◮ maximise profit of direct-mailing campaign
◮ choose top cut-off % of sorted data for mailing
campaign
◮ 185 DM for responder, -12 DM for non-responder
◮ winner of DM-Cup: 67,038 DM ◮ Data-Mining Specialists: 84,995 DM (external
knowledge)
◮ sparse grid classification (PCA 18 attributes): 87,705
DM
◮ use to be classified data in learning process ◮ split 60:40 of training, fit parameters λA, γI, cut-off ◮ sparse grid for semi-supervised learning: 93,812 DM ◮ finding good parameters more difficult
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Dimension Adaptive Combination Technique
Remaining problems
◮ still high dependence on the dimension ◮ classical construction too static ◮ spatial adaptivity in high dimensions impracticable ◮ however: dimension importance/interaction often
varies Ansatz
◮ start hierarchy with constant ◮ use lower degree formulas in less important
dimensions
◮ large reduction in complexity if important dimensions
are few (small effective dimension)
◮ reduction in complexity if interaction between
dimensions is small (Friedmann ’91: Not more than 5 for real data ?)
◮ dimension adaptive construction of generalised
sparse grid
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Hierarchy with Constant Functions
ˆ W−1,−1 ˆ W0,−1 ˆ W1,−1 ˆ W2,−1 ˆ W3,−1 ˆ W−1,3 ˆ W−1,2 ˆ W−1,1 ˆ W−1,0 ˆ W0,3 W1,3 W2,3 W3,3 ˆ W0,2 W1,2 W2,2 W3,2 ˆ W0,1 W1,1 W2,1 W3,1 ˆ W0,0 ˆ W1,0 ˆ W2,0 ˆ W3,0
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Refinement Procedure
◮ start with coarsest grid (index) ◮ successively add indices ◮ compute solution on corresponding grid
Hegland ’01, Gerstner, Griebel ’03
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?
Dimension Adaptive Algorithm
◮ Ensure that in each step index set remains
admissible
◮ Need robust and efficient criteria for adaptive choice
- f grids
◮ currently computing residual ◮ need optimised combination technique, otherwise
decrease not guaranteed
◮ Active indices: indices currently not in index set, but
which can be added in next step
◮ f (c) I
(x) :=
k∈I
(1,...,1)
z=(0,...,0)(−1)|z|1 · χI(k + z)
- fk(x).
MLSS 06 - Canberra Sparse Grids Jochen Garcke (Very) Short Course on Finite Elements Hierarchical Basis Sparse Grids Combination Technique Regression / Classification via Function Reconstruction Opticom Semi-supervised Learning Outlook: Dimension Adaptive Combination Technique Sparse Grids in Reinforcement Learning ?