From greedy approximation to greedy optimization Vladimir Temlyakov - - PowerPoint PPT Presentation

from greedy approximation to greedy optimization
SMART_READER_LITE
LIVE PREVIEW

From greedy approximation to greedy optimization Vladimir Temlyakov - - PowerPoint PPT Presentation

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014


slide-1
SLIDE 1

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

From greedy approximation to greedy optimization

Vladimir Temlyakov July, 2014

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-2
SLIDE 2

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

1

Introduction

2

Greedy approximation in Hilbert spaces

3

Greedy approximation in Banach spaces

4

Greedy algorithms for convex optimization

5

Lebesgue-type inequality

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-3
SLIDE 3

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Toy example

Let Ψ := {ψ}∞

k=1 be an orthonormal basis for a Hilbert space H.

For any f ∈ H there is a convergent (in H) orthogonal expansion f =

  • k=1

f , ψkψk. A classical way of approximation of f is to take a partial sum Sn(f , Ψ) :=

n

  • k=1

f , ψkψk. For the error we have f − Sn(f , Ψ)2 =

  • k=n+1

|f , ψk|2.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-4
SLIDE 4

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

m-term approximation

In nonlinear approximation we use the m-term approximation

  • k∈Λ

f , ψkψk, |Λ| = m. It is clear that the optimal (from the point of view of the error) choice of Λ is the set of m biggest in absolute value coefficients f , ψk. We can realize this choice by picking the biggest coefficients one by one. This results in the reordering (greedy reordering) of the orthogonal expansion: f =

  • i=1

f , ψkiψki, |f , ψk1| ≥ |f , ψk2| ≥ . . . .

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-5
SLIDE 5

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Major questions of greedy approximation

1 Let instead of an orthonormal basis Ψ we have a redundant

system D. How to approximate with regard to D?

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-6
SLIDE 6

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Major questions of greedy approximation

1 Let instead of an orthonormal basis Ψ we have a redundant

system D. How to approximate with regard to D?

2 How to work in a Banach space X instead of a Hilbert space

H?

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-7
SLIDE 7

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Notations

We begin with the case where approximation takes place in a Banach space X equipped with a norm · := · X. We formulate our approximation problem in the following general way. Definition (Dictionary) We say a set of functions D from X is a dictionary if each g ∈ X has norm one (gX = 1) and the closure of Span D coincides with X. We let Σm(D) denote the collection of all functions (elements) in X which can be expressed as a linear combination of at most m elements of D.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-8
SLIDE 8

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

m-sparse elements

Thus each function s ∈ Σm(D) can be written in the form s =

  • g∈Λ

cgg, Λ ⊂ D, #Λ ≤ m, where the cg are real numbers. In some cases, it may be possible to write an element from Σm(D) in this form in more than one

  • way. The space Σm(D) is not linear: the sum of two functions

from Σm(D) is generally not in Σm(D).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-9
SLIDE 9

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Examples

Perhaps the first example of approximation involving dictionaries was considered by E. Schmidt in 1907, who considered the approximation of functions f (x, y) of two variables in L2([0, 1]2) by functions of the form Bm(x, y) =

m

  • j=1

cjuj(x)vj(y). This approximation problem can be seen as an m-term approximation with regard to the dictionary Π = {g : g(x, y) = u(x)v(y); u, v ∈ L2([0, 1]), uL2 = vL2 = 1}.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-10
SLIDE 10

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

One more example

Another approximation problem of this type which is well known in statistics is the projection pursuit regression problem. The problem is to approximate in L2 a given multivariate function f ∈ L2 by a sum of ridge functions, i.e. by Wm(x) =

m

  • j=1

rj(ωj, x), x, ωj ∈ Rd, j = 1, . . . , m, where rj, j = 1, . . . , m, are univariate functions.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-11
SLIDE 11

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

More examples

Another example, from signal processing, uses the Gabor functions ga,b(x) := eiaxe−bx2 and approximates a univariate function by linear combinations of the elements {ga,b(x − c) : a, c ∈ R, b > 0}.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-12
SLIDE 12

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Best m-term approximation

For a function f ∈ X we define its best m-term approximation error σm(f , D)X := inf

s∈Σm(D) f − sX.

We concentrate on an important problem of finding good methods

  • f m-term approximation in the case of general dictionary D and
  • n studying their efficiency. Let us begin this discussion in the

special case of a Hilbert space with the inner product ·, ·. We define first the Weak Greedy Algorithm (WGA) in Hilbert space H. We describe this algorithm for a general dictionary D.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-13
SLIDE 13

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGA

Let a sequence τ = {tk}∞

k=1, 0 ≤ tk ≤ 1, be given.

WGA We define f τ

0 := f . Then for each m ≥ 1, we inductively

define:

1 ϕτ

m ∈ D is any satisfying

|f τ

m−1, ϕτ m| ≥ tm sup g∈D

|f τ

m−1, g|;

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-14
SLIDE 14

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGA

Let a sequence τ = {tk}∞

k=1, 0 ≤ tk ≤ 1, be given.

WGA We define f τ

0 := f . Then for each m ≥ 1, we inductively

define:

1 ϕτ

m ∈ D is any satisfying

|f τ

m−1, ϕτ m| ≥ tm sup g∈D

|f τ

m−1, g|;

2 f τ

m := f τ m−1 − f τ m−1, ϕτ mϕτ m;

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-15
SLIDE 15

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGA

Let a sequence τ = {tk}∞

k=1, 0 ≤ tk ≤ 1, be given.

WGA We define f τ

0 := f . Then for each m ≥ 1, we inductively

define:

1 ϕτ

m ∈ D is any satisfying

|f τ

m−1, ϕτ m| ≥ tm sup g∈D

|f τ

m−1, g|;

2 f τ

m := f τ m−1 − f τ m−1, ϕτ mϕτ m;

3 G τ

m(f , D) := m j=1f τ j−1, ϕτ j ϕτ j .

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-16
SLIDE 16

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Historical comment

In the case tk = 1, k = 1, . . . the WGA is called Pure Greedy Algorithm (PGA). The PGA was proposed by J.H. Friedman and

  • W. Stuetzle in 1981 for the ridge dictionary. We note that in a

particular case tk = t, k = 1, 2, . . . , the WGA was considered by L. Jones (1987) (also for the ridge dictionary). The WGA provides for each f ∈ H an expansion into a series (greedy expansion) f ∼

  • j=1

cj(f )ϕτ

j ,

cj(f ) := f τ

j−1, ϕτ j .

In general it is not an expansion into an orthogonal series but it has some similar properties.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-17
SLIDE 17

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Parseval’s formula

The coefficients cj(f ) of an expansion are obtained by the Fourier formulas with f replaced by the residuals f τ

j−1. It is easy to see that

f τ

m2 = f τ m−12 − |cm(f )|2.

There are convergence results for the greedy expansion and, therefore, from the above equality we get for this expansion an analog of the Parseval formula for orthogonal expansions: f 2 =

  • j=1

|cj(f )|2.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-18
SLIDE 18

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Rate of convergence

For a general dictionary D we define the class of functions Ao

1(D, M) := {f ∈ H : f =

  • k∈Λ

ckwk, wk ∈ D, #Λ < ∞

  • k∈Λ

|ck| ≤ M} and we define A1(D, M) as the closure (in H) of Ao

1(D, M).

Furthermore, we define A1(D) as the union of the classes A1(D, M) over all M > 0. For f ∈ A1(D), we define the norm |f |A1(D) as the smallest M such that f ∈ A1(D, M).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-19
SLIDE 19

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

First results

It was proved in [DeVore,T., 1996] that for a general dictionary D the Pure Greedy Algorithm provides the following estimate f − Gm(f , D) ≤ |f |A1(D)m−1/6. (1) (In this and similar estimates we consider that the inequality holds for all possible choices of {Gm}.) That paper contains also an example of a dictionary D and an element f such that f − Gm(f , D) > 1 2|f |A1(D)m−1/2, m ≥ 4.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-20
SLIDE 20

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Further results

We proved in [Konyagin,T., 1999] an estimate f − Gm(f , D) ≤ 4|f |A1(D)m−11/62 which improves a little the original one (see (1)).

  • E. Livshitz and T. (2002) proved the following lower estimate.

There exist a dictionary D and an element f ∈ H, f = 0, such that f − Gm(f , D) ≥ Cm−0.27|f |A1(D) with a positive constant C.

  • A. Sil’nichenko improved the exponent 11/62 to 0.182 in the upper

estimate and E. Livshitz improved the exponent 0.27 to 0.1898 in the lower estimate.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-21
SLIDE 21

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Open Problem

Find the right order of the sequence sup

f ,H,D

f − Gm(f , D)/|f |A1(D).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-22
SLIDE 22

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WOGA

Let a sequence τ = {tk}∞

k=1, 0 ≤ tk ≤ 1, be given. We define the

Weak Orthogonal Greedy Algorithm (WOGA). WOGA We define f o,τ := f . Then for each m ≥ 1 we inductively define:

1 ϕo,τ

m ∈ D is any element satisfying

|f o,τ

m−1, ϕo,τ m | ≥ tm sup g∈D

|f o,τ

m−1, g|;

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-23
SLIDE 23

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WOGA

Let a sequence τ = {tk}∞

k=1, 0 ≤ tk ≤ 1, be given. We define the

Weak Orthogonal Greedy Algorithm (WOGA). WOGA We define f o,τ := f . Then for each m ≥ 1 we inductively define:

1 ϕo,τ

m ∈ D is any element satisfying

|f o,τ

m−1, ϕo,τ m | ≥ tm sup g∈D

|f o,τ

m−1, g|;

2

G o,τ

m (f , D) := PHτ

m(f ),

where Hτ

m := Span(ϕo,τ 1 , . . . , ϕo,τ m );

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-24
SLIDE 24

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WOGA

Let a sequence τ = {tk}∞

k=1, 0 ≤ tk ≤ 1, be given. We define the

Weak Orthogonal Greedy Algorithm (WOGA). WOGA We define f o,τ := f . Then for each m ≥ 1 we inductively define:

1 ϕo,τ

m ∈ D is any element satisfying

|f o,τ

m−1, ϕo,τ m | ≥ tm sup g∈D

|f o,τ

m−1, g|;

2

G o,τ

m (f , D) := PHτ

m(f ),

where Hτ

m := Span(ϕo,τ 1 , . . . , ϕo,τ m );

3 f o,τ

m

:= f − G o,τ

m (f , D).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-25
SLIDE 25

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Rate of convergence

Theorem (T., 2000) Let D be an arbitrary dictionary in H. Then for each f ∈ A1(D, M) we have f − G o,τ

m (f , D) ≤ M(1 + m

  • k=1

t2

k)−1/2.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-26
SLIDE 26

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Notations

Let X be a Banach space with norm · . Definition We say that a set of elements (functions) D from X is a symmetric dictionary if each g ∈ D has norm equal to one (g = 1), g ∈ D implies −g ∈ D, and closure of Span D = X. For an element f ∈ X we denote by Ff a norming (peak) functional for f : Ff = 1, Ff (f ) = f .

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-27
SLIDE 27

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Two forms

The greedy step (the first step) of the PGA can be interpreted in two ways. First, we look at the mth step for an element ϕm ∈ D and a number λm satisfying fm−1 − λmϕmH = inf

g∈D,λ fm−1 − λgH.

(2)

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-28
SLIDE 28

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Two forms

The greedy step (the first step) of the PGA can be interpreted in two ways. First, we look at the mth step for an element ϕm ∈ D and a number λm satisfying fm−1 − λmϕmH = inf

g∈D,λ fm−1 − λgH.

(2) Second, we look for an element ϕm ∈ D such that fm−1, ϕm = sup

g∈D

fm−1, g. (3)

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-29
SLIDE 29

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Two forms

The greedy step (the first step) of the PGA can be interpreted in two ways. First, we look at the mth step for an element ϕm ∈ D and a number λm satisfying fm−1 − λmϕmH = inf

g∈D,λ fm−1 − λgH.

(2) Second, we look for an element ϕm ∈ D such that fm−1, ϕm = sup

g∈D

fm−1, g. (3) In a Hilbert space both versions (2) and (3) result in the same

  • PGA. In a general Banach space the corresponding versions of (2)

and (3) lead to different greedy algorithms.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-30
SLIDE 30

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

XGA

The Banach space version of (2) is straightforward: instead of the Hilbert norm · H in (2) we use the Banach norm · X. This results in the following greedy algorithm. X-Greedy Algorithm (XGA) We define f0 := f , G0 := 0. Then, for each m ≥ 1, we inductively define

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-31
SLIDE 31

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

XGA

The Banach space version of (2) is straightforward: instead of the Hilbert norm · H in (2) we use the Banach norm · X. This results in the following greedy algorithm. X-Greedy Algorithm (XGA) We define f0 := f , G0 := 0. Then, for each m ≥ 1, we inductively define

1 ϕm ∈ D, λm ∈ R are such that (we assume existence)

fm−1 − λmϕmX = inf

g∈D,λ fm−1 − λgX.

(4)

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-32
SLIDE 32

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

XGA

The Banach space version of (2) is straightforward: instead of the Hilbert norm · H in (2) we use the Banach norm · X. This results in the following greedy algorithm. X-Greedy Algorithm (XGA) We define f0 := f , G0 := 0. Then, for each m ≥ 1, we inductively define

1 ϕm ∈ D, λm ∈ R are such that (we assume existence)

fm−1 − λmϕmX = inf

g∈D,λ fm−1 − λgX.

(4)

2 Denote

fm := fm−1 − λmϕm, Gm := Gm−1 + λmϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-33
SLIDE 33

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Dual greedy algorithm

The second version of the PGA in a Banach space is based on the concept of a norming (peak) functional. We note that in a Hilbert space a norming functional Ff acts as follows Ff (g) = f /f , g. Therefore, (3) can be rewritten in terms of the norming functional Ffm−1 as Ffm−1(ϕm) = sup

g∈D

Ffm−1(g). (5) This observation leads to the class of dual greedy algorithms. We define the Weak Dual Greedy Algorithm with weakness τ := {tk}∞

k=1 (WDGA(τ)).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-34
SLIDE 34

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WDGA

Weak Dual Greedy Algorithm (WDGA(τ)) Let τ := {tm}∞

m=1,

tm ∈ [0, 1], be a weakness sequence. We define f0 := f . Then, for each m ≥ 1, we inductively define

1 ϕm ∈ D is any satisfying

Ffm−1(ϕm) ≥ tmFfm−1D. (6)

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-35
SLIDE 35

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WDGA

Weak Dual Greedy Algorithm (WDGA(τ)) Let τ := {tm}∞

m=1,

tm ∈ [0, 1], be a weakness sequence. We define f0 := f . Then, for each m ≥ 1, we inductively define

1 ϕm ∈ D is any satisfying

Ffm−1(ϕm) ≥ tmFfm−1D. (6)

2 Define am as

fm−1 − amϕm = min

a∈R fm−1 − aϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-36
SLIDE 36

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WDGA

Weak Dual Greedy Algorithm (WDGA(τ)) Let τ := {tm}∞

m=1,

tm ∈ [0, 1], be a weakness sequence. We define f0 := f . Then, for each m ≥ 1, we inductively define

1 ϕm ∈ D is any satisfying

Ffm−1(ϕm) ≥ tmFfm−1D. (6)

2 Define am as

fm−1 − amϕm = min

a∈R fm−1 − aϕm.

3 Denote

fm := fm−1 − amϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-37
SLIDE 37

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Remark

First results on greedy approximation in Banach spaces were

  • btained by M. Donahue, L. Gurvits, C. Darken, and E. Sontag,

1997. Let τ := {tk}∞

k=1 be a given sequence of nonnegative numbers

tk ≤ 1, k = 1, . . .. We define first the Weak Chebyshev Greedy Algorithm (WCGA) that is a generalization for Banach spaces of Weak Orthogonal Greedy Algorithm defined for Hilbert spaces.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-38
SLIDE 38

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WCGA

WCGA We define f c

0 := f c,τ

:= f . Then for each m ≥ 1 we inductively define

1 ϕc

m := ϕc,τ m ∈ D is any satisfying

Ff c

m−1(ϕc

m) ≥ tm sup g∈D

Ff c

m−1(g). Vladimir Temlyakov From greedy approximation to greedy optimization

slide-39
SLIDE 39

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WCGA

WCGA We define f c

0 := f c,τ

:= f . Then for each m ≥ 1 we inductively define

1 ϕc

m := ϕc,τ m ∈ D is any satisfying

Ff c

m−1(ϕc

m) ≥ tm sup g∈D

Ff c

m−1(g). 2 Define

Φm := Φτ

m := Span{ϕc j }m j=1,

and G c

m := G c,τ m

to be the best approximant to f from Φm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-40
SLIDE 40

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WCGA

WCGA We define f c

0 := f c,τ

:= f . Then for each m ≥ 1 we inductively define

1 ϕc

m := ϕc,τ m ∈ D is any satisfying

Ff c

m−1(ϕc

m) ≥ tm sup g∈D

Ff c

m−1(g). 2 Define

Φm := Φτ

m := Span{ϕc j }m j=1,

and G c

m := G c,τ m

to be the best approximant to f from Φm.

3 Denote f c

m := f c,τ m

:= f − G c

m.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-41
SLIDE 41

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Modulus of smoothness

We consider here approximation in uniformly smooth Banach spaces. Definition For a Banach space X we define the modulus of smoothness ρ(u) := sup

x=y=1

(1 2(x + uy + x − uy) − 1). The uniformly smooth Banach space is the one with the property lim

u→0 ρ(u)/u = 0.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-42
SLIDE 42

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Rate of convergence

We denote the closure of the convex hull of D by A1(D). Theorem (T., 2001) Let X be a uniformly smooth Banach space with the modulus of smoothness ρ(u) ≤ γuq, 1 < q ≤ 2. Then for a sequence τ := {tk}∞

k=1, tk ≤ 1, k = 1, 2, . . . , we have for any f ∈ A1(D)

that f c,τ

m ≤ C(q, γ)(1 + m

  • k=1

tp

k )−1/p,

p := q q − 1, with a constant C(q, γ) which may depend only on q and γ.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-43
SLIDE 43

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGAFR

Weak Greedy Algorithm with Free Relaxation (WGAFR). Let τ := {tm}∞

m=1, tm ∈ [0, 1], be a weakness sequence. We define

f0 := f and G0 := 0. Then for each m ≥ 1 we define:

1 ϕm ∈ D is any element satisfying

Ffm−1(ϕm) ≥ tm sup

g∈D

Ffm−1(g).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-44
SLIDE 44

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGAFR

Weak Greedy Algorithm with Free Relaxation (WGAFR). Let τ := {tm}∞

m=1, tm ∈ [0, 1], be a weakness sequence. We define

f0 := f and G0 := 0. Then for each m ≥ 1 we define:

1 ϕm ∈ D is any element satisfying

Ffm−1(ϕm) ≥ tm sup

g∈D

Ffm−1(g).

2 Find wm and λm such that

f − ((1 − wm)Gm−1 + λmϕm) = inf

λ,w f − ((1 − w)Gm−1 + λϕm)

and define Gm := (1 − wm)Gm−1 + λmϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-45
SLIDE 45

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGAFR

Weak Greedy Algorithm with Free Relaxation (WGAFR). Let τ := {tm}∞

m=1, tm ∈ [0, 1], be a weakness sequence. We define

f0 := f and G0 := 0. Then for each m ≥ 1 we define:

1 ϕm ∈ D is any element satisfying

Ffm−1(ϕm) ≥ tm sup

g∈D

Ffm−1(g).

2 Find wm and λm such that

f − ((1 − wm)Gm−1 + λmϕm) = inf

λ,w f − ((1 − w)Gm−1 + λϕm)

and define Gm := (1 − wm)Gm−1 + λmϕm.

3 Let

fm := f − Gm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-46
SLIDE 46

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Rate of convergence

Theorem (T., 2008) Let X be a uniformly smooth Banach space with modulus of smoothness ρ(u) ≤ γuq, 1 < q ≤ 2. Take a number ǫ ≥ 0 and two elements f , f ǫ from X such that f − f ǫ ≤ ǫ, f ǫ/B ∈ A1(D), with some number B = C(f , ǫ, D, X) > 0. Then, for both algorithms WCGA and WGAFR we have (p := q/(q − 1)) fm ≤ max

  • 2ǫ, C(q, γ)(B + ǫ)(1 +

m

  • k=1

tp

k )−1/p

  • .

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-47
SLIDE 47

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Modulus of smoothness

We assume that the set D := {x : E(x) ≤ E(0)} is bounded. For a bounded set D define the modulus of smoothness of E on D as follows ρ(E, u) := 1 2 sup

x∈D,y=1

|E(x + uy) + E(x − uy) − 2E(x)|. (7)

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-48
SLIDE 48

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Modulus of smoothness

We assume that the set D := {x : E(x) ≤ E(0)} is bounded. For a bounded set D define the modulus of smoothness of E on D as follows ρ(E, u) := 1 2 sup

x∈D,y=1

|E(x + uy) + E(x − uy) − 2E(x)|. (7) A typical assumption in convex optimization is of the form (y = 1) |E(x + uy) − E(x) − E ′(x), uy| ≤ Cu2 which corresponds to the case ρ(E, u) of order u2. We assume that E is Fr´ echet differentiable.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-49
SLIDE 49

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

The Frank-Wolfe-type algorithm

Let τ := {tk}∞

k=1 be a given weakness sequence of numbers

tk ∈ [0, 1], k = 1, . . . . Weak Relaxed Greedy Algorithm (WRGA(co)). We define G0 := G r,τ := 0. Then, for each m ≥ 1 we define:

1 ϕm := ϕr,τ

m ∈ D is any element satisfying

−E ′(Gm−1), ϕm − Gm−1 ≥ tm sup

g∈D

−E ′(Gm−1), g − Gm−1.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-50
SLIDE 50

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

The Frank-Wolfe-type algorithm

Let τ := {tk}∞

k=1 be a given weakness sequence of numbers

tk ∈ [0, 1], k = 1, . . . . Weak Relaxed Greedy Algorithm (WRGA(co)). We define G0 := G r,τ := 0. Then, for each m ≥ 1 we define:

1 ϕm := ϕr,τ

m ∈ D is any element satisfying

−E ′(Gm−1), ϕm − Gm−1 ≥ tm sup

g∈D

−E ′(Gm−1), g − Gm−1.

2 Find 0 ≤ λm ≤ 1 such that

E((1 − λm)Gm−1 + λmϕm) = inf

0≤λ≤1 E((1 − λ)Gm−1 + λϕm)

and define Gm := G r,τ

m

:= (1 − λm)Gm−1 + λmϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-51
SLIDE 51

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Rate of approximation

Theorem (T., 2012) Let E be a uniformly smooth convex function with modulus of smoothness ρ(E, u) ≤ γuq, 1 < q ≤ 2. Then, for a sequence τ := {tk}∞

k=1, tk ≤ 1, k = 1, 2, . . . , we have for any f ∈ A1(D)

that E(Gm) − E(f ) ≤

  • C1(q, γ) + C2(q, γ)

m

  • k=1

tp

k

1−q , p := q q − 1, with positive constants C1(q, γ), C2(q, γ) which may depend only

  • n q and γ.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-52
SLIDE 52

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGAFR(co)

Weak Greedy Algorithm with Free Relaxation (WGAFR(co)). Let τ := {tm}∞

m=1, tm ∈ [0, 1], be a weakness sequence. We define

G0 := 0. Then for each m ≥ 1 we have:

1 ϕm ∈ D is any element satisfying

−E ′(Gm−1), ϕm ≥ tm sup

g∈D

−E ′(Gm−1), g.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-53
SLIDE 53

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

WGAFR(co)

Weak Greedy Algorithm with Free Relaxation (WGAFR(co)). Let τ := {tm}∞

m=1, tm ∈ [0, 1], be a weakness sequence. We define

G0 := 0. Then for each m ≥ 1 we have:

1 ϕm ∈ D is any element satisfying

−E ′(Gm−1), ϕm ≥ tm sup

g∈D

−E ′(Gm−1), g.

2 Find wm and λm such that

E((1 − wm)Gm−1 + λmϕm) = inf

λ,w E((1 − w)Gm−1 + λϕm)

and define Gm := (1 − wm)Gm−1 + λmϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-54
SLIDE 54

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Rate of convergence for WGAFR(co)

Theorem (T, 2012) Let E be a uniformly smooth convex function with modulus of smoothness ρ(E, u) ≤ γuq, 1 < q ≤ 2. Take a number ǫ ≥ 0 and an element f ǫ from D such that E(f ǫ) ≤ inf

x∈D E(x) + ǫ,

f ǫ/B ∈ A1(D), with some number B = C(E, ǫ, D) ≥ 1. Then we have (p := q/(q − 1)) E(Gm) − inf

x∈D E(x) ≤

max  2ǫ, C1(E, q, γ)Bq

  • C2(E, q, γ) +

m

  • k=1

tp

k

1−q .

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-55
SLIDE 55

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Gradient type algorithms

The most difficult part of an algorithm is to find an element ϕm ∈ D to be used in approximation process. We consider greedy methods for finding ϕm ∈ D. We have two types of greedy steps to find ϕm ∈ D.

  • I. Gradient greedy step. At this step we look for an element

ϕm ∈ D such that −E ′(Gm−1), ϕm ≥ tm sup

g∈D

−E ′(Gm−1), g. Algorithms that use the first derivative of the objective function E are called first order optimization algorithms.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-56
SLIDE 56

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Zero order algorithms

  • II. E-greedy step. At this step we look for an element ϕm ∈ D

which satisfies (we assume existence): inf

c∈R E(Gm−1 + cϕm) =

inf

g∈D,c∈R E(Gm−1 + cg).

Algorithms that only use the values of the objective function E are called zero order optimization algorithms.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-57
SLIDE 57

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Approximation step

After we found ϕm ∈ D we can proceed in different ways. We now list some typical steps that are motivated by the corresponding steps in greedy approximation theory. These steps or their variants are used in optimization algorithms like gradient method, reduced gradient method, conjugate gradients, gradient pursuits. (A) Best step in the direction ϕm ∈ D. We choose cm such that E(Gm−1 + cmϕm) = inf

c∈R E(Gm−1 + cϕm)

and define Gm := Gm−1 + cmϕm.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-58
SLIDE 58

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Other approximation steps

(B) Shortened best step in the direction ϕm ∈ D. We choose cm as in (A) and for a given parameter b > 0 define G b

m := G b m−1 + bcmϕm.

Usually, b ∈ (0, 1). This is why we call it shortened. (C) Chebyshev-type (fully corrective) methods. We choose Gm ∈ span(ϕ1, . . . , ϕm) which satisfies E(Gm) = inf

cj,j=1,...,m E(c1ϕ1 + · · · + cmϕm).

(D) Fixed relaxation. For a given sequence {rk}∞

k=1 of relaxation

parameters rk ∈ [0, 1) we choose Gm := (1 − rm)Gm−1 + cmϕm with cm from E((1 − rm)Gm−1 + cmϕm) = inf

c∈R E((1 − rm)Gm−1 + cϕm).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-59
SLIDE 59

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

More approximation steps

(F) Free relaxation. We choose Gm ∈ span(Gm−1, ϕm) which satisfies E(Gm) = inf

c1,c2 E(c1Gm−1 + c2ϕm).

(G) Prescribed coefficients. For a given sequence {ck}∞

k=1 of

positive coefficients in the case of greedy step I we define Gm := Gm−1 + cmϕm. (8) In the case of greedy step II we define Gm by formula (8) with the greedy step II modified as follows: ϕm ∈ D is an element satisfying E(Gm−1 + cmϕm) = inf

g∈D E(Gm−1 + cmg).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-60
SLIDE 60

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Problem

We are interested in the following fundamental problem of sparse approximation. Problem How to design a practical algorithm that builds sparse approximations comparable to best m-term approximations?

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-61
SLIDE 61

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Problem

We are interested in the following fundamental problem of sparse approximation. Problem How to design a practical algorithm that builds sparse approximations comparable to best m-term approximations? In other words: How to choose elements from the dictionary for good m-term approximation?

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-62
SLIDE 62

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Haar basis

Remark In the case X = Lp, 1 < p < ∞, D = Hp, the recipe is very simple: for a given f ∈ Lp, f =

  • I

cI(f )HI,p, choose those HI,p for which the |cI(f )| are the largest.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-63
SLIDE 63

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Haar basis

Remark In the case X = Lp, 1 < p < ∞, D = Hp, the recipe is very simple: for a given f ∈ Lp, f =

  • I

cI(f )HI,p, choose those HI,p for which the |cI(f )| are the largest. This discovery led to the actively developing theory of greedy-type

  • bases. The above recipe gives us the Thresholding Greedy

Algorithm (TGA).

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-64
SLIDE 64

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Lebesgue-type inequality for the TGA

Theorem (T., 1998) For each f ∈ Lp(Td) we have f − Gm(f , T )p ≤ (1 + 3mh(p))σm(f , T )p, 1 ≤ p ≤ ∞, where h(p) := |1/2 − 1/p|.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-65
SLIDE 65

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Lebesgue-type inequality for the TGA

Theorem (T., 1998) For each f ∈ Lp(Td) we have f − Gm(f , T )p ≤ (1 + 3mh(p))σm(f , T )p, 1 ≤ p ≤ ∞, where h(p) := |1/2 − 1/p|. Remark There is a positive absolute constant C such that for each m and 1 ≤ p ≤ ∞ there exists a function f = 0 with the property Gm(f , T )p ≥ Cmh(p)f p.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-66
SLIDE 66

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Problem for the trigonometric system

Thus the recipe that works well for the Haar basis does not work well for the trigonometric system. Problem How to choose harmonics (frequencies) for good m-term trigonometric approximation?

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-67
SLIDE 67

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Problem for the trigonometric system

Thus the recipe that works well for the Haar basis does not work well for the trigonometric system. Problem How to choose harmonics (frequencies) for good m-term trigonometric approximation? Remark It turns out that the following recipe works well for 2 ≤ p < ∞. We describe the mth iteration for approximating f . Suppose fm−1 is the residual after m − 1 iterations. Then we look for the largest Fourier coefficient of the function fm−1|fm−1|p−2 and choose the corresponding harmonic.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-68
SLIDE 68

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Lebesgue-type inequality for the WCGA

Theorem (T.,2013) Let D be the normalized in Lp, 2 ≤ p < ∞, real d-variate trigonometric system. Then for any f ∈ Lp the WCGA with weakness parameter t gives fC(t,p,d)m ln(m+1)p ≤ Cσm(f , D)p. (9)

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-69
SLIDE 69

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

Lebesgue-type inequality for the WCGA

Theorem (T.,2013) Let D be the normalized in Lp, 2 ≤ p < ∞, real d-variate trigonometric system. Then for any f ∈ Lp the WCGA with weakness parameter t gives fC(t,p,d)m ln(m+1)p ≤ Cσm(f , D)p. (9) The Open Problem 7.1 (p. 91) from [Temlyakov, 2003] asks if (9) holds without an extra ln(m + 1) factor. The above theorem is the first result on the Lebesgue-type inequalities for the WCGA with respect to the trigonometric system. It provides a progress in solving the above mentioned open problem, but the problem is still

  • pen.

Vladimir Temlyakov From greedy approximation to greedy optimization

slide-70
SLIDE 70

Introduction Greedy approximation in Hilbert spaces Greedy approximation in Banach spaces Greedy algorithms for convex optimization Lebesgue-type inequality

THANK YOU!

Vladimir Temlyakov From greedy approximation to greedy optimization