Greedy Sparse Linear Approximations of Functionals from Nodal Data - PowerPoint PPT Presentation

Greedy Sparse Linear Approximations of Functionals from Nodal Data R. Schaback Göttingen Academy of Sciences and Humanities Georg–August–Universität Göttingen Institut für Numerische und Angewandte Mathematik http://www.num.math.uni-goettingen.de/schaback/research/group.html MAIA 2013, Erice, Italy, Sept. 2013

Overview Linear Recovery Optimal Recovery Greedy Recovery Examples

Linear Recovery

Standard Problem of Numerical Analysis ◮ You have data, but you want different data ◮ You have a few data of an otherwise unknown function, but you want different data of that function ◮ Examples: ◮ Numerical integration ◮ Numerical differentiation ◮ PDE solving ◮ Given λ 1 ( u ) , . . . , λ N ( u ) , find µ ( u ) for continuous linear functionals µ, λ 1 , . . . , λ N ∈ U ∗ ◮ Assumption: Linear recovery N � µ ( u ) ≈ a j λ j ( u ) for all u ∈ U j = 1

Linear Recovery ◮ Linear recovery N � µ ( u ) ≈ a j λ j ( u ) for all u ∈ U j = 1 ◮ Error functional N � for all u ∈ U , a ∈ R N ǫ a ( u ) := µ ( u ) − a j λ j ( u ) j = 1 ◮ Sharp error bound � � N � � � � � µ ( u ) − a j λ j ( u ) ·� u � u ≤ � ǫ a � U ∗ � � � � � �� j = 1 � � ???? ◮ If � ǫ a � U ∗ were known: Error known in % of � u � u ◮ Problem: Calculate � ǫ a � U ∗ and optimize over a

Reproducing Kernel Hilbert Spaces ◮ Assumption: U Hilbert space of functions on Ω with continuous and linearly independent point evaluations ◮ Theorem: Then u is a RKHS with some positive definite kernel K : Ω × Ω → R u ( x ) ( u , K ( x , · )) U for all x ∈ Ω , u ∈ U = µ ( u ) ( u , µ x K ( x , · )) U for all µ ∈ U ∗ , u ∈ U = λ x µ y K ( x , y ) for all λ, µ ∈ U ∗ ( λ, µ ) U ∗ = ◮ Consequence: a ǫ y ǫ x a K ( x , y ) � ǫ a � 2 = U ∗ µ x µ y K ( x , y ) − 2 � N j = 1 a j µ x λ y j K ( x , y ) = + � N � N j λ y k = 1 a j a k λ x k K ( x , y ) j = 1 ◮ This quadratic form in a can be explicitly calculated ◮ Different approximations can be compared ◮ This quadratic form can be optimized

Sobolev Spaces ◮ Assumption: U Hilbert space of functions on Ω with continuous and linearly independent point evaluations ◮ Take U := W m 2 ( R d ) with m > d / 2 ◮ Matérn–Sobolev kernel: K ( x , y ) = � x − y � m − d / 2 K m − d / 2 ( � x − y � 2 ) 2 with modified Bessel function of second kind. ◮ For “nice” domains Ω ⊂ R d : W m 2 (Ω) is norm–equivalent to W m 2 ( R d )

Example: Numerical Integration in Sobolev Spaces ◮ 5 points in [ − 1 , 1 ] ◮ Standard weights � ǫ � 2 � ǫ � 2 � ǫ � 2 Points 1 2 3 equidistant 0.103077 0.001034817 0.00016066 Gauß 0.090650164 0.000409353 0.00000298

Optimal Recovery

Optimal Recovery in RKHS ◮ Minimize the quadratic form a ǫ y ǫ x a K ( x , y ) � ǫ a � 2 = U ∗ µ x µ y K ( x , y ) − 2 � N j = 1 a j µ x λ y j K ( x , y ) = + � N � N j λ y k = 1 a j a k λ x k K ( x , y ) j = 1 ◮ Solution a ∗ satisfies linear system N � j λ y k K ( x , y ) = µ x λ y j λ x a ∗ k K ( x , y ) , 1 ≤ k ≤ N j = 1 ◮ Then N � j µ x λ y U ∗ = µ x µ y K ( x , y ) − a ∗ j K ( x , y ) � ǫ a ∗ � 2 j = 1

Connection to Interpolation ◮ Theorem: The optimal recovery is equal to the µ –value of the interpolant of the data λ j ( u ) , 1 ≤ j ≤ N on the span of the functions λ x j K ( x , · ) , 1 ≤ j ≤ N . ◮ Proof steps: ◮ There is a Lagrange basis u 1 , . . . , u N of that span with λ j ( u k ) = δ jk , 1 ≤ j , k ≤ N ◮ The interpolant of data f j = λ j ( u ) is N N � � u ( x ) u j ( x ) f j = u j ( x ) λ j ( u ) ˜ = j = 1 j = 1 N N � � µ ( u ) ≈ µ (˜ u ) µ ( u j ) f j = µ ( u j ) λ j ( u ) = j = 1 j = 1 ◮ Then, as a recovery formula, a j = µ ( u j ) , 1 ≤ k ≤ N

Connection to Interpolation II ◮ Optimality? ◮ We need N � j λ y k K ( x , y ) = µ x λ y µ ( u j ) λ x k K ( x , y ) , 1 ≤ k ≤ N j = 1 The Lagrange property implies N � j λ y k K ( x , y ) = λ y u j ( x ) λ x k K ( x , y ) , 1 ≤ k ≤ N j = 1 and application of µ proves optimality.

Example: Optimal Integration in Sobolev Spaces ◮ 5 points in [ − 1 , 1 ] � ǫ � 2 � ǫ � 2 � ǫ � 2 Points Method W 1 W 2 W 3 2 ( R ) ∗ 2 ( R ) ∗ 2 ( R ) ∗ equi standard 0.103077 0.00103481 0.00016066 equi optimal 0.101896 0.00066343 0.00001082 Gauß standard 0.090650 0.00040935 0.00000298 Gauß optimal 0.085169 0.00040768 0.00000296 optimal optimal 0.063935 0.00019882 0.00000106 ◮ Optimal points are chosen to minimize the error norm of the optimal recovery in Sobolev space

Summary of Recovery ◮ Consider linear recoveries of unknown data from known data ◮ Do this on a RKHS ◮ Evaluate the norm of the error functional ◮ This allows fair comparisons between recoveries ◮ Optimize the recovery weights ◮ Goal: Optimize the choice of data points ◮ Goal: Do this efficiently

Greedy Recovery

Greedy Recovery Specialize to recovery from nodal data: N � µ ( u ) ≈ a ∗ j u ( x j ) for all u ∈ U j = 1 with optimal weights a ∗ j , but vary the points x j Optimization of error norm in some RKHS, e.g. Sobolev space W m 2 ( R d ) , m > d / 2 Note: a ∗ j = a ∗ j ( x 1 , . . . , x N ) Goal: Greedy recursive method, N �→ N + 1 Goal: Sparsity via greedy selection

Interpolation For a Lagrange basis u N 1 , . . . , u N N : interpolant to u is N � u N s u ( x ) = j ( x ) u ( x j ) for all u ∈ U j = 1 Optimal recovery is N � µ ( u N µ ( u ) ≈ µ ( s u ) = u ( x j ) for all u ∈ U j ) j = 1 � �� =: a ∗ j Drawback: Lagrange basis is not efficiently recursive Goal: Use Newton basis recursively

Recursive Newton Basis Points x 1 , x 2 , . . . , x k , x k + 1 , . . . Recursive Kernels (RS, 1997) K 0 ( x , y ) K ( x , y ) := K k ( x , y ) − K k ( x k + 1 , x ) K k ( x k + 1 , y ) K k + 1 ( x , y ) := K k ( x k + 1 , x k + 1 ) Newton basis functions (St. Müller/RS, 2009) K k ( x k + 1 , x ) v k + 1 ( x ) := � K k ( x k + 1 , x k + 1 ) k + 1 � K k + 1 ( x , y ) K ( x , y ) − v j ( x ) v j ( y ) = j = 1 K k ( x , y ) − v k + 1 ( x ) v k + 1 ( y ) =

Properties of Newton Basis Orthonormality ( v j , v k ) U = δ jk Zeros v k + 1 ( x j ) = 0 , 1 ≤ j ≤ k Power Function for interpolation: P 2 � ǫ ( δ x ; α ∗ 1 ( δ x ) , . . . , α ∗ k + 1 ( δ x )) � 2 k + 1 ( δ x ) := H ∗ K k + 1 ( x , x ) ( M . Mouattamid , RS 09 ) = K k ( x , x ) − v 2 k + 1 ( x ) = P 2 k ( δ x ) − v 2 k + 1 ( x ) =

Recursive Recovery Functional µ , points x 1 , x 2 , . . . , x k , x k + 1 , . . . Recursive Equations I µ x K 0 ( x , y ) µ x K ( x , y ) = µ x K k ( x , y ) − µ x K k ( x k + 1 , x ) K k ( x k + 1 , y ) µ x K k + 1 ( x , y ) = K k ( x k + 1 , x k + 1 ) µ y µ x K k ( x , y ) − µ x K k ( x k + 1 , x ) µ y K k ( x k + 1 , y µ y µ x K k + 1 ( x , y ) = K k ( x k + 1 , x k + 1 ) µ y µ x K k ( x , y ) − µ ( v k + 1 ) 2 = P 2 k + 1 ( µ )) � 2 � ǫ ( µ ; α ∗ 1 ( µ ) , . . . , α ∗ k + 1 ( µ ) := H ∗ µ x µ y K k + 1 ( x , y ) = P 2 k ( µ ) − µ ( v k + 1 ) 2 = k + 1 � µ x µ y K ( x , y ) − µ ( v j ) 2 . = j = 1 Goal: Choose x k + 1 to maximize µ ( v k + 1 ) 2

Recursive Recovery II Functional µ , points x 1 , x 2 , . . . , x k , x k + 1 , . . . Goal: Choose x k + 1 to maximize µ ( v k + 1 ) 2 Recursive Equations K k ( x k + 1 , x ) v k + 1 ( x ) := � K k ( x k + 1 , x k + 1 ) µ x ( K k ( x k + 1 , x )) µ ( v k + 1 ) = � K k ( x k + 1 , x k + 1 ) Maximize as function of z : R k ( z ) := ( µ x K k ( z , x )) 2 K k ( z , z ) Stop if R k is small on available points Problems: Efficiency? Stability?

Implementation: Overview Total given points: x 1 , . . . , x N Goal: Select a small subset of n points greedily Computational Complexity: O ( nN ) for update n − 1 → n Computational Complexity: O ( n 2 N ) in total Storage: O ( nN ) for n Newton basis functions Similarly for interpolation of n data with evaluation on N points via Newton basis

Implementation I Total given points: x 1 , . . . , x N Goal: Select a small subset greedily Index set I := ∅ to collect selected point indices Various N -vectors for steps k = 0 , 1 , . . . , N : K k ( K k ( x j , x j )) 1 ≤ j ≤ N := ( µ x K k ( x , x j )) 1 ≤ j ≤ N M k := ( R k ( x 1 ) , . . . , R k ( x N )) T = M k . ∧ 2 ./ K k R k := Easy initialization for k = 0 Find maximum of R 0 . insert point index into I Further N -vectors : k 1 ( K 0 ( x I ( 1 ) , x j )) 1 ≤ j ≤ N := k 1 ./. √ K 0 V 1 :=

Implementation II Recursion n ⇒ n + 1 : Given: N –vectors K n , M n , V 1 , . . . , V n , Index set I with n point indices Maximize R n = M n . ∧ 2 ./ K n , if max. is not too small and add new point index into I Now I is I = { y 1 , . . . , y n + 1 } . Use n � K n ( y n + 1 , x j ) K ( y n + 1 , x j ) v k ( y n + 1 ) v k ( x j ) , = � − � �� k = 1 n � �� e T k n + 1 k n + 1 , 0 − I ( n + 1 ) V k V k = k = 1 k n + 1 ./ √ K n V n + 1 := K n + 1 K n − V n + 1 . ∧ 2 =

Greedy Sparse Linear Approximations of Functionals from Nodal Data - PowerPoint PPT Presentation

Greedy Sparse Linear Approximations of Functionals from Nodal Data R. Schaback Gttingen Academy of Sciences and Humanities GeorgAugustUniversitt Gttingen Institut fr Numerische und Angewandte Mathematik

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Generating Sparse Representations by Adaptive Multiscale Approximations Angela Kunoth University

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy algorithms Greedy algorithms Find the best solution to a local problem and (hope) it

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy routing Greedy routing Other variations on greedy criterion Introduce

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

On nonlinear approximations and the linear hull effect Anne Canteaut Inria, Paris, France joint

Computable functionals over non-flat data types Helmut Schwichtenberg (j.w.w. Basil Karadais,

Minimal logic for computable functionals Helmut Schwichtenberg Mathematisches Institut der

Data reshaping with tidyr Data reshaping with tidyr and functionals with purrr and functionals

Gaussian kernel regression as an improved estimator for magnetization curve and spin gap Tota

CS 3220: Introduction to Scientific Computing Steve Marschner Spring 2009 Monday, January 19,

LGSL: Numerical algorithms for Lua A Lua-ish interface to the GNU Scientific Library Lesley De

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Higher-Order Distributions for Linear Logic Marie Kerjean & Jean-Simon Lemay Inria Bretagne -

EEEEKKKKK!!!! In January, 1942 a Soviet Ilyushin 4 flown by Lieutenant I.M.Chisov was badly

Applied Machine Learning Applied Machine Learning Gradient Computation & Automatic

Generalized Techniques in Numerical Integration Richard M. Slevinsky and Hassan Safouhi

Greedy Sparse Linear Approximations of Functionals from Nodal Data - PowerPoint PPT Presentation

Greedy Sparse Linear Approximations of Functionals from Nodal Data R. Schaback Gttingen Academy of Sciences and Humanities GeorgAugustUniversitt Gttingen Institut fr Numerische und Angewandte Mathematik

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Generating Sparse Representations by Adaptive Multiscale Approximations Angela Kunoth University

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy algorithms Greedy algorithms Find the best solution to a local problem and (hope) it

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy routing Greedy routing Other variations on greedy criterion Introduce

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

On nonlinear approximations and the linear hull effect Anne Canteaut Inria, Paris, France joint

Computable functionals over non-flat data types Helmut Schwichtenberg (j.w.w. Basil Karadais,

Minimal logic for computable functionals Helmut Schwichtenberg Mathematisches Institut der

Data reshaping with tidyr Data reshaping with tidyr and functionals with purrr and functionals

Gaussian kernel regression as an improved estimator for magnetization curve and spin gap Tota

CS 3220: Introduction to Scientific Computing Steve Marschner Spring 2009 Monday, January 19,

LGSL: Numerical algorithms for Lua A Lua-ish interface to the GNU Scientific Library Lesley De

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Higher-Order Distributions for Linear Logic Marie Kerjean &amp; Jean-Simon Lemay Inria Bretagne -

EEEEKKKKK!!!! In January, 1942 a Soviet Ilyushin 4 flown by Lieutenant I.M.Chisov was badly

Applied Machine Learning Applied Machine Learning Gradient Computation &amp; Automatic

Generalized Techniques in Numerical Integration Richard M. Slevinsky and Hassan Safouhi

Higher-Order Distributions for Linear Logic Marie Kerjean & Jean-Simon Lemay Inria Bretagne -

Applied Machine Learning Applied Machine Learning Gradient Computation & Automatic