Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 - - PowerPoint PPT Presentation

decoding in compressed sensing
SMART_READER_LITE
LIVE PREVIEW

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 - - PowerPoint PPT Presentation

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R N with N large x I USC, 2008 p. 2/33 Discrete Compressed Sensing R N with N large x I We are able to ask n questions about x USC, 2008


slide-1
SLIDE 1

Decoding in Compressed Sensing

Ronald DeVore

USC, 2008 – p. 1/33

slide-2
SLIDE 2

Discrete Compressed Sensing

x ∈ I RN with N large

USC, 2008 – p. 2/33

slide-3
SLIDE 3

Discrete Compressed Sensing

x ∈ I RN with N large

We are able to ask n questions about x

USC, 2008 – p. 2/33

slide-4
SLIDE 4

Discrete Compressed Sensing

x ∈ I RN with N large

We are able to ask n questions about x Question means inner product v · x with v ∈ I

RN - called

sample

USC, 2008 – p. 2/33

slide-5
SLIDE 5

Discrete Compressed Sensing

x ∈ I RN with N large

We are able to ask n questions about x Question means inner product v · x with v ∈ I

RN - called

sample What are the best questions to ask??

USC, 2008 – p. 2/33

slide-6
SLIDE 6

Discrete Compressed Sensing

x ∈ I RN with N large

We are able to ask n questions about x Question means inner product v · x with v ∈ I

RN - called

sample What are the best questions to ask?? Any such sampling is given by Φx where Φ is an n × N matrix

USC, 2008 – p. 2/33

slide-7
SLIDE 7

Discrete Compressed Sensing

x ∈ I RN with N large

We are able to ask n questions about x Question means inner product v · x with v ∈ I

RN - called

sample What are the best questions to ask?? Any such sampling is given by Φx where Φ is an n × N matrix We are interested in the good / best matrices Φ

USC, 2008 – p. 2/33

slide-8
SLIDE 8

Discrete Compressed Sensing

x ∈ I RN with N large

We are able to ask n questions about x Question means inner product v · x with v ∈ I

RN - called

sample What are the best questions to ask?? Any such sampling is given by Φx where Φ is an n × N matrix We are interested in the good / best matrices Φ Here good means the samples y = Φx contain enough information to approximate x well

USC, 2008 – p. 2/33

slide-9
SLIDE 9

Encoder/Decoder

We view Φ as an encoder

USC, 2008 – p. 3/33

slide-10
SLIDE 10

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y

USC, 2008 – p. 3/33

slide-11
SLIDE 11

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y N := {η : Φη = 0} the null space of Φ

USC, 2008 – p. 3/33

slide-12
SLIDE 12

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y N := {η : Φη = 0} the null space of Φ F(y) := {x : Φx = y} = x0 + N for any x0 ∈ F(y)

USC, 2008 – p. 3/33

slide-13
SLIDE 13

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y N := {η : Φη = 0} the null space of Φ F(y) := {x : Φx = y} = x0 + N for any x0 ∈ F(y)

The hyperplanes F(y) with y ∈ I

Rn stratify I RN

USC, 2008 – p. 3/33

slide-14
SLIDE 14

The sets F(y)

F ( y 1 ) F ( y 2 ) F ( y k )

USC, 2008 – p. 4/33

slide-15
SLIDE 15

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y N := {η : Φη = 0} the null space of Φ F(y) := {x : Φx = y} = x0 + N for any x0 ∈ F(y)

The hyperplanes F(y) with y ∈ I

Rn stratify I RN

Decoder is any (possibly nonlinear) mapping ∆ from

I Rn → I RN

USC, 2008 – p. 5/33

slide-16
SLIDE 16

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y N := {η : Φη = 0} the null space of Φ F(y) := {x : Φx = y} = x0 + N for any x0 ∈ F(y)

The hyperplanes F(y) with y ∈ I

Rn stratify I RN

Decoder is any (possibly nonlinear) mapping ∆ from

I Rn → I RN ¯ x := ∆(Φ(x)) is our approximation to x from the

information extracted

USC, 2008 – p. 5/33

slide-17
SLIDE 17

Encoder/Decoder

We view Φ as an encoder Since Φ : I

RN → I Rn many x are encoded with same y N := {η : Φη = 0} the null space of Φ F(y) := {x : Φx = y} = x0 + N for any x0 ∈ F(y)

The hyperplanes F(y) with y ∈ I

Rn stratify I RN

Decoder is any (possibly nonlinear) mapping ∆ from

I Rn → I RN ¯ x := ∆(Φ(x)) is our approximation to x from the

information extracted Let A := An,N := {(Φ, ∆) : Φ is n × N}

USC, 2008 – p. 5/33

slide-18
SLIDE 18

GOAL: Instance-Optimal

How should we measure performance of the encoding-decoding?

USC, 2008 – p. 6/33

slide-19
SLIDE 19

GOAL: Instance-Optimal

How should we measure performance of the encoding-decoding? Define Σk := {x ∈ I

RN : #supp(x) ≤ k} σk(x)ℓp := inf

z∈Σk x − zℓN

p

USC, 2008 – p. 6/33

slide-20
SLIDE 20

GOAL: Instance-Optimal

How should we measure performance of the encoding-decoding? Define Σk := {x ∈ I

RN : #supp(x) ≤ k} σk(x)ℓp := inf

z∈Σk x − zℓN

p

Given an encoding - decoding pair (Φ, ∆), we say that this pair is Instance-Optimal of order k for ℓp if for an absolute constant C > 0

x − ∆(Φ(x))ℓp ≤ Cσk(x)ℓp

USC, 2008 – p. 6/33

slide-21
SLIDE 21

GOAL: Instance-Optimal

How should we measure performance of the encoding-decoding? Define Σk := {x ∈ I

RN : #supp(x) ≤ k} σk(x)ℓp := inf

z∈Σk x − zℓN

p

Given an encoding - decoding pair (Φ, ∆), we say that this pair is Instance-Optimal of order k for ℓp if for an absolute constant C > 0

x − ∆(Φ(x))ℓp ≤ Cσk(x)ℓp

The encoding - decoding pairs which have the largest k are the best.

USC, 2008 – p. 6/33

slide-22
SLIDE 22

What are optimal systems

Two issues: Good matrices and good decoders

USC, 2008 – p. 7/33

slide-23
SLIDE 23

What are optimal systems

Two issues: Good matrices and good decoders We shall first discuss what are good matrices: everything is controlled by the null space

USC, 2008 – p. 7/33

slide-24
SLIDE 24

What are optimal systems

Two issues: Good matrices and good decoders We shall first discuss what are good matrices: everything is controlled by the null space Theorem of Cohen-Dahmen-DeVore: We have instance

  • ptimality of order k in ℓp for some decoder ∆ if and
  • nly if Φ has the Null Space Property (NSP) of order 2k:

for each #(T) ≤ 2k we have

ηT ℓp ≤ CηT c

USC, 2008 – p. 7/33

slide-25
SLIDE 25

What are optimal systems

Two issues: Good matrices and good decoders We shall first discuss what are good matrices: everything is controlled by the null space Theorem of Cohen-Dahmen-DeVore: We have instance

  • ptimality of order k in ℓp for some decoder ∆ if and
  • nly if Φ has the Null Space Property (NSP) of order 2k:

for each #(T) ≤ 2k we have

ηT ℓp ≤ CηT c

Notation: xT is the vector that agrees with x on T and is zero otherwise.

USC, 2008 – p. 7/33

slide-26
SLIDE 26

What are optimal systems

Two issues: Good matrices and good decoders We shall first discuss what are good matrices: everything is controlled by the null space Theorem of Cohen-Dahmen-DeVore: We have instance

  • ptimality of order k in ℓp for some decoder ∆ if and
  • nly if Φ has the Null Space Property (NSP) of order 2k:

for each #(T) ≤ 2k we have

ηT ℓp ≤ CηT c

Notation: xT is the vector that agrees with x on T and is zero otherwise. Here the decoder is pushed into the background: The above theorem gives no information about whether practical decoders will work

USC, 2008 – p. 7/33

slide-27
SLIDE 27

Restricted Isometry Property

How to check NSP?

USC, 2008 – p. 8/33

slide-28
SLIDE 28

Restricted Isometry Property

How to check NSP? A sufficient condition is the Restricted Isometry Property

USC, 2008 – p. 8/33

slide-29
SLIDE 29

Restricted Isometry Property

How to check NSP? A sufficient condition is the Restricted Isometry Property We say Φ has the Restricted Isometry Property (RIP)of

  • rder k with constant δ ∈ (0, 1) if

(1 = δ)xℓ2 ≤ Φxℓ2 ≤ (1 + δ)xℓ2 x ∈ Σk

USC, 2008 – p. 8/33

slide-30
SLIDE 30

Restricted Isometry Property

How to check NSP? A sufficient condition is the Restricted Isometry Property We say Φ has the Restricted Isometry Property (RIP)of

  • rder k with constant δ ∈ (0, 1) if

(1 = δ)xℓ2 ≤ Φxℓ2 ≤ (1 + δ)xℓ2 x ∈ Σk

CDD show RIP is sufficient to guarantee NSP but the exact results depend very strongly on p

USC, 2008 – p. 8/33

slide-31
SLIDE 31

Sample Results: ℓ1

If Φ has RIP for 2k then Φ is instance-optimal of order k in ℓ1:

x − ∆(Φ(x))ℓ1 ≤ Cσk(x)ℓ1

USC, 2008 – p. 9/33

slide-32
SLIDE 32

Sample Results: ℓ1

If Φ has RIP for 2k then Φ is instance-optimal of order k in ℓ1:

x − ∆(Φ(x))ℓ1 ≤ Cσk(x)ℓ1

This means that there is some decoder - not practical

USC, 2008 – p. 9/33

slide-33
SLIDE 33

Sample Results: ℓ1

If Φ has RIP for 2k then Φ is instance-optimal of order k in ℓ1:

x − ∆(Φ(x))ℓ1 ≤ Cσk(x)ℓ1

This means that there is some decoder - not practical We know there are matrices Φ with RIP property for

k ≤ c0n/ log(N/n)

USC, 2008 – p. 9/33

slide-34
SLIDE 34

Sample Results: ℓ1

If Φ has RIP for 2k then Φ is instance-optimal of order k in ℓ1:

x − ∆(Φ(x))ℓ1 ≤ Cσk(x)ℓ1

This means that there is some decoder - not practical We know there are matrices Φ with RIP property for

k ≤ c0n/ log(N/n)

This range of k cannot be improved: from now on we refer to this as the largest range of k

USC, 2008 – p. 9/33

slide-35
SLIDE 35

Sample Results: ℓp

If an n × N matrix Φ is instance-optimal of order k = 1 in

ℓ2 with constant C0 then n ≥ N/C0

USC, 2008 – p. 10/33

slide-36
SLIDE 36

Sample Results: ℓp

If an n × N matrix Φ is instance-optimal of order k = 1 in

ℓ2 with constant C0 then n ≥ N/C0

This shows that instance-optimal is not a viable concept for ℓ2

USC, 2008 – p. 10/33

slide-37
SLIDE 37

Sample Results: ℓp

If an n × N matrix Φ is instance-optimal of order k = 1 in

ℓ2 with constant C0 then n ≥ N/C0

This shows that instance-optimal is not a viable concept for ℓ2 For 1 < p < 2 and any k ≤ c0N

2−2/p 1−2/p[n/ log(N/n)] p 2−p

USC, 2008 – p. 10/33

slide-38
SLIDE 38

Sample Results: ℓp

If an n × N matrix Φ is instance-optimal of order k = 1 in

ℓ2 with constant C0 then n ≥ N/C0

This shows that instance-optimal is not a viable concept for ℓ2 For 1 < p < 2 and any k ≤ c0N

2−2/p 1−2/p[n/ log(N/n)] p 2−p

This bound cannot be improved

USC, 2008 – p. 10/33

slide-39
SLIDE 39

Sample Results: ℓp

If an n × N matrix Φ is instance-optimal of order k = 1 in

ℓ2 with constant C0 then n ≥ N/C0

This shows that instance-optimal is not a viable concept for ℓ2 For 1 < p < 2 and any k ≤ c0N

2−2/p 1−2/p[n/ log(N/n)] p 2−p

This bound cannot be improved Matrices that satisfy instance-optimal for this range of k are obtained from matrices which satisfy RIP for

¯ k = k2/p−1N2−2/p

USC, 2008 – p. 10/33

slide-40
SLIDE 40

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k?

USC, 2008 – p. 11/33

slide-41
SLIDE 41

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k? Choose at random N vectors from the unit sphere in I

Rn

and use these as the columns of Φ

USC, 2008 – p. 11/33

slide-42
SLIDE 42

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k? Choose at random N vectors from the unit sphere in I

Rn

and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N(0, 1/√n)

USC, 2008 – p. 11/33

slide-43
SLIDE 43

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k? Choose at random N vectors from the unit sphere in I

Rn

and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N(0, 1/√n) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one.

USC, 2008 – p. 11/33

slide-44
SLIDE 44

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k? Choose at random N vectors from the unit sphere in I

Rn

and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N(0, 1/√n) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. With high probability on the draw the resulting matrix will have RIP for the largest range of k

USC, 2008 – p. 11/33

slide-45
SLIDE 45

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k? Choose at random N vectors from the unit sphere in I

Rn

and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N(0, 1/√n) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. With high probability on the draw the resulting matrix will have RIP for the largest range of k Problem: None of these are constructive. Can we put

  • ur hands on matrices??

USC, 2008 – p. 11/33

slide-46
SLIDE 46

What are good matrices

How can we construct Φ satisfying RIP for the largest range of k? Choose at random N vectors from the unit sphere in I

Rn

and use these as the columns of Φ Choose each entry of Φ independently and at random from the Gaussian distribution N(0, 1/√n) We choose each entry of Φ independently and at random from the Bernouli distribution and then normalize columns to have length one. With high probability on the draw the resulting matrix will have RIP for the largest range of k Problem: None of these are constructive. Can we put

  • ur hands on matrices??

No constructions are known for largest range of k

USC, 2008 – p. 11/33

slide-47
SLIDE 47

Instance-Optimality in Probability

We saw that Instance-Optimality for ℓN

2 is not viable

Suppose Φ(ω) is a collection of random matrices We say this family satisfies RIP of order k with probability 1 − ǫ if a random draw {Φ(ω)} will satisfy RIP

  • f order k with probability 1 − ǫ

We say {Φ(ω)} is bounded with probability 1 − ǫ if given any x ∈ I

RN with probability 1 − ǫ a random draw {Φ(ω)}

will satisfy

Φ(ω)(x)ℓN

2 ≤ C0xℓN 2

with C0 an absolute constant Our earlier analysis showed that Gaussian and Bernouli random matrices have these properties with ǫ = e−cn

USC, 2008 – p. 12/33

slide-48
SLIDE 48

Theorem: Cohen-Dahmen-DeVore

If {Φ(ω)} satisfies RIP of order 3k and boundedness each with probability 1 − ǫ then there are decoders ∆(ω) such that given any x ∈ ℓN

2 we have with probability

1 − 2ǫ x − ∆(ω)Φ(ω)(x)ℓN

2 ≤ C0σk(x)ℓN 2

Instance-optimality in probability Range of k is k ≤ c0n/ log(N/n) Decoder is impractical

USC, 2008 – p. 13/33

slide-49
SLIDE 49

Decoding

By far the most intriguing part of Compressed Sensing is the decoding

USC, 2008 – p. 14/33

slide-50
SLIDE 50

Decoding

By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest

USC, 2008 – p. 14/33

slide-51
SLIDE 51

Decoding

By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders

USC, 2008 – p. 14/33

slide-52
SLIDE 52

Decoding

By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders

ℓ1 minimization: Long history (Donoho;

Candes-Romberg)

USC, 2008 – p. 14/33

slide-53
SLIDE 53

Decoding

By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders

ℓ1 minimization: Long history (Donoho;

Candes-Romberg) Greedy algorithms - find support of a good approximation vector and then decode using ℓ2 minimization (Gilbert-Tropp; Needel-Vershynin(ROMP), Donoho (STOMP))

USC, 2008 – p. 14/33

slide-54
SLIDE 54

Decoding

By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders

ℓ1 minimization: Long history (Donoho;

Candes-Romberg) Greedy algorithms - find support of a good approximation vector and then decode using ℓ2 minimization (Gilbert-Tropp; Needel-Vershynin(ROMP), Donoho (STOMP)) Iterative Reweighted Least Squares (Osborne, Daubechies-DeVore-Fornasier-Gunturk)

USC, 2008 – p. 14/33

slide-55
SLIDE 55

Decoding

By far the most intriguing part of Compressed Sensing is the decoding There are continuing debates as to which decoding is numerically fastest Some common decoders

ℓ1 minimization: Long history (Donoho;

Candes-Romberg) Greedy algorithms - find support of a good approximation vector and then decode using ℓ2 minimization (Gilbert-Tropp; Needel-Vershynin(ROMP), Donoho (STOMP)) Iterative Reweighted Least Squares (Osborne, Daubechies-DeVore-Fornasier-Gunturk) We shall make some remarks on these decoders emphasizing the last one

USC, 2008 – p. 14/33

slide-56
SLIDE 56

Issues in Decoding

Range of Instance Optimality: When combined with encoder does it give full range of instance optimality?

USC, 2008 – p. 15/33

slide-57
SLIDE 57

Issues in Decoding

Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode?

USC, 2008 – p. 15/33

slide-58
SLIDE 58

Issues in Decoding

Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise?

USC, 2008 – p. 15/33

slide-59
SLIDE 59

Issues in Decoding

Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? Theorems versus numerical examples

USC, 2008 – p. 15/33

slide-60
SLIDE 60

Issues in Decoding

Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? Theorems versus numerical examples Instance Optimality in Probability

USC, 2008 – p. 15/33

slide-61
SLIDE 61

Issues in Decoding

Range of Instance Optimality: When combined with encoder does it give full range of instance optimality? Number of computations to decode? Robustness to noise? Theorems versus numerical examples Instance Optimality in Probability Given that we cant construct best encoding matrices it seems that the best results would correspond to random draws of matrices

USC, 2008 – p. 15/33

slide-62
SLIDE 62

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1

USC, 2008 – p. 16/33

slide-63
SLIDE 63

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1 x∗ = x − η∗ where η∗ := Argmin

η∈N

x − ηℓ1

USC, 2008 – p. 16/33

slide-64
SLIDE 64

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1 x∗ = x − η∗ where η∗ := Argmin

η∈N

x − ηℓ1

Can be solved by Linear Programming

USC, 2008 – p. 16/33

slide-65
SLIDE 65

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1 x∗ = x − η∗ where η∗ := Argmin

η∈N

x − ηℓ1

Can be solved by Linear Programming Let T := supp(x)

USC, 2008 – p. 16/33

slide-66
SLIDE 66

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1 x∗ = x − η∗ where η∗ := Argmin

η∈N

x − ηℓ1

Can be solved by Linear Programming Let T := supp(x) Recall that x is an ℓ1 minimizer if and only if

|

  • i∈T

sign(xi)ηi| ≤

  • i∈T c

|ηi|, ∀η ∈ N

USC, 2008 – p. 16/33

slide-67
SLIDE 67

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1 x∗ = x − η∗ where η∗ := Argmin

η∈N

x − ηℓ1

Can be solved by Linear Programming Let T := supp(x) Recall that x is an ℓ1 minimizer if and only if

|

  • i∈T

sign(xi)ηi| ≤

  • i∈T c

|ηi|, ∀η ∈ N

If N has the following null space property: there is a

γ < 1 with ηT ℓ1 ≤ γηT cℓ1, ∀η ∈ N, #(T) ≤ k

USC, 2008 – p. 16/33

slide-68
SLIDE 68

ℓ1 minimization

ℓ1 minimization: x∗ := Argmin

z∈F(y)

zℓ1 x∗ = x − η∗ where η∗ := Argmin

η∈N

x − ηℓ1

Can be solved by Linear Programming Let T := supp(x) Recall that x is an ℓ1 minimizer if and only if

|

  • i∈T

sign(xi)ηi| ≤

  • i∈T c

|ηi|, ∀η ∈ N

If N has the following null space property: there is a

γ < 1 with ηT ℓ1 ≤ γηT cℓ1, ∀η ∈ N, #(T) ≤ k

then all x ∈ Σk have unique ℓ1 minimizers equal to x

USC, 2008 – p. 16/33

slide-69
SLIDE 69

Orhtogonal Matching Pursuit (OMP)

We seek approximations to y from the dictionary

D := {φ1, . . . , φN} consisting of the columns of Φ

USC, 2008 – p. 17/33

slide-70
SLIDE 70

Orhtogonal Matching Pursuit (OMP)

We seek approximations to y from the dictionary

D := {φ1, . . . , φN} consisting of the columns of Φ j1 := Argmax

j=1,···,N

|y, φj|

USC, 2008 – p. 17/33

slide-71
SLIDE 71

Orhtogonal Matching Pursuit (OMP)

We seek approximations to y from the dictionary

D := {φ1, . . . , φN} consisting of the columns of Φ j1 := Argmax

j=1,···,N

|y, φj| y1 := z1

j1φj1 with z1 j1 := y, φj1/φj12

USC, 2008 – p. 17/33

slide-72
SLIDE 72

Orhtogonal Matching Pursuit (OMP)

We seek approximations to y from the dictionary

D := {φ1, . . . , φN} consisting of the columns of Φ j1 := Argmax

j=1,···,N

|y, φj| y1 := z1

j1φj1 with z1 j1 := y, φj1/φj12

i-th step: {j1, · · · , ji} and yi = i

l=1 zi jlφji the orthogonal

projection of y onto Span{φj1, · · · , φjl}

USC, 2008 – p. 17/33

slide-73
SLIDE 73

Orhtogonal Matching Pursuit (OMP)

We seek approximations to y from the dictionary

D := {φ1, . . . , φN} consisting of the columns of Φ j1 := Argmax

j=1,···,N

|y, φj| y1 := z1

j1φj1 with z1 j1 := y, φj1/φj12

i-th step: {j1, · · · , ji} and yi = i

l=1 zi jlφji the orthogonal

projection of y onto Span{φj1, · · · , φjl}

ji+1 := Argmax

j=1,···,N

|ri, φj| where ri := y − yi is the residual

USC, 2008 – p. 17/33

slide-74
SLIDE 74

Orhtogonal Matching Pursuit (OMP)

We seek approximations to y from the dictionary

D := {φ1, . . . , φN} consisting of the columns of Φ j1 := Argmax

j=1,···,N

|y, φj| y1 := z1

j1φj1 with z1 j1 := y, φj1/φj12

i-th step: {j1, · · · , ji} and yi = i

l=1 zi jlφji the orthogonal

projection of y onto Span{φj1, · · · , φjl}

ji+1 := Argmax

j=1,···,N

|ri, φj| where ri := y − yi is the residual xi is zi

j1, · · · , zi ji augmented by zeros in the other

coordinates, xi

j = zi j, if j ∈ {j1, · · · , ji}, 0 otherwise.

USC, 2008 – p. 17/33

slide-75
SLIDE 75

Results for Decoding for Random Draws

Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k

USC, 2008 – p. 18/33

slide-76
SLIDE 76

Results for Decoding for Random Draws

Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices

USC, 2008 – p. 18/33

slide-77
SLIDE 77

Results for Decoding for Random Draws

Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance

  • ptimality in ℓ2?

USC, 2008 – p. 18/33

slide-78
SLIDE 78

Results for Decoding for Random Draws

Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance

  • ptimality in ℓ2?

Wojtaszcek has shown that for Gaussian matrices ℓ1 minimization does the job

USC, 2008 – p. 18/33

slide-79
SLIDE 79

Results for Decoding for Random Draws

Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance

  • ptimality in ℓ2?

Wojtaszcek has shown that for Gaussian matrices ℓ1 minimization does the job This result rests on a geometric property of Gaussian matrices

USC, 2008 – p. 18/33

slide-80
SLIDE 80

Results for Decoding for Random Draws

Gilbert and Tropp proved that for Bernouli random matrices, OMP captures a k sparse vector with high probability for the large range of k Cohen-Dahmen-DeVore extend this to general random families of matrices Are there practical decoders that give instance

  • ptimality in ℓ2?

Wojtaszcek has shown that for Gaussian matrices ℓ1 minimization does the job This result rests on a geometric property of Gaussian matrices Namely the image of the unit ℓ1 ball under such matrices will with high probability contain an ℓ2 ball of radius 1/

√ k

USC, 2008 – p. 18/33

slide-81
SLIDE 81

General Random Families

Cohen-Dahmen-DeVore

USC, 2008 – p. 19/33

slide-82
SLIDE 82

General Random Families

Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain

x − ∆(Φ(x))ℓ2 ≤ Cσk(x)ℓ2 + ǫ

with high probability

USC, 2008 – p. 19/33

slide-83
SLIDE 83

General Random Families

Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain

x − ∆(Φ(x))ℓ2 ≤ Cσk(x)ℓ2 + ǫ

with high probability Encoders that attain this

USC, 2008 – p. 19/33

slide-84
SLIDE 84

General Random Families

Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain

x − ∆(Φ(x))ℓ2 ≤ Cσk(x)ℓ2 + ǫ

with high probability Encoders that attain this

ℓ1 minimization

USC, 2008 – p. 19/33

slide-85
SLIDE 85

General Random Families

Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain

x − ∆(Φ(x))ℓ2 ≤ Cσk(x)ℓ2 + ǫ

with high probability Encoders that attain this

ℓ1 minimization

Greedy thresholding: at each iteration take all coordinates for which the inner product

r, φν ≥ δrℓ2/ √ k where r is the residual

USC, 2008 – p. 19/33

slide-86
SLIDE 86

General Random Families

Cohen-Dahmen-DeVore Given an arbitrary ǫ > 0 we can obtain

x − ∆(Φ(x))ℓ2 ≤ Cσk(x)ℓ2 + ǫ

with high probability Encoders that attain this

ℓ1 minimization

Greedy thresholding: at each iteration take all coordinates for which the inner product

r, φν ≥ δrℓ2/ √ k where r is the residual

Here δ > 0 is a fixed threshold parameter

USC, 2008 – p. 19/33

slide-87
SLIDE 87

Other Possible Decoders

We seek other possible decoding algorithms which may be faster

USC, 2008 – p. 20/33

slide-88
SLIDE 88

Other Possible Decoders

We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ

USC, 2008 – p. 20/33

slide-89
SLIDE 89

Other Possible Decoders

We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem

¯ x := Argmin

z∈F(y)

zℓ2 = Argmin

η∈N

x − ηℓ2

USC, 2008 – p. 20/33

slide-90
SLIDE 90

Other Possible Decoders

We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem

¯ x := Argmin

z∈F(y)

zℓ2 = Argmin

η∈N

x − ηℓ2

We know this does not work well

USC, 2008 – p. 20/33

slide-91
SLIDE 91

Other Possible Decoders

We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem

¯ x := Argmin

z∈F(y)

zℓ2 = Argmin

η∈N

x − ηℓ2

We know this does not work well However it is easy to compute

¯ x = Φt[ΦΦt]−1Φx = Φt[ΦΦt]−1y

USC, 2008 – p. 20/33

slide-92
SLIDE 92

Other Possible Decoders

We seek other possible decoding algorithms which may be faster Here we add in the knowledge of our Compressed Sensing setting - the RIP property for Φ We have also discussed the least squares problem

¯ x := Argmin

z∈F(y)

zℓ2 = Argmin

η∈N

x − ηℓ2

We know this does not work well However it is easy to compute

¯ x = Φt[ΦΦt]−1Φx = Φt[ΦΦt]−1y O(Nn2) arithmetic operations

USC, 2008 – p. 20/33

slide-93
SLIDE 93

Weighted Least Squares

Consider weighted ℓ2 minimization.

USC, 2008 – p. 21/33

slide-94
SLIDE 94

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

USC, 2008 – p. 21/33

slide-95
SLIDE 95

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

uℓ2(w) := N

j=1 wju2 j

1/2

USC, 2008 – p. 21/33

slide-96
SLIDE 96

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

uℓ2(w) := N

j=1 wju2 j

1/2 u, vw := N

j=1 wjujvj

USC, 2008 – p. 21/33

slide-97
SLIDE 97

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

uℓ2(w) := N

j=1 wju2 j

1/2 u, vw := N

j=1 wjujvj

Define x(w) := Argmin

z∈F(y)

zℓ2(w)

USC, 2008 – p. 21/33

slide-98
SLIDE 98

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

uℓ2(w) := N

j=1 wju2 j

1/2 u, vw := N

j=1 wjujvj

Define x(w) := Argmin

z∈F(y)

zℓ2(w) x(w) = x − η(w) where η(w) := Argmin

η∈N

x − ηℓ2(w)

USC, 2008 – p. 21/33

slide-99
SLIDE 99

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

uℓ2(w) := N

j=1 wju2 j

1/2 u, vw := N

j=1 wjujvj

Define x(w) := Argmin

z∈F(y)

zℓ2(w) x(w) = x − η(w) where η(w) := Argmin

η∈N

x − ηℓ2(w)

Note that this solution is characterized by the

  • rthogonality conditions x(w), ηw = 0,

η ∈ N

USC, 2008 – p. 21/33

slide-100
SLIDE 100

Weighted Least Squares

Consider weighted ℓ2 minimization. Let wj > 0, j = 1, . . . , N be a positive weight

uℓ2(w) := N

j=1 wju2 j

1/2 u, vw := N

j=1 wjujvj

Define x(w) := Argmin

z∈F(y)

zℓ2(w) x(w) = x − η(w) where η(w) := Argmin

η∈N

x − ηℓ2(w)

Note that this solution is characterized by the

  • rthogonality conditions x(w), ηw = 0,

η ∈ N

We can again solve for x(w) in O(Nn2) operations

USC, 2008 – p. 21/33

slide-101
SLIDE 101

Connections

Suppose x∗ ∈ I

RN is the ℓ1 minimizer from F(y) and T = supp(x∗)

USC, 2008 – p. 22/33

slide-102
SLIDE 102

Connections

Suppose x∗ ∈ I

RN is the ℓ1 minimizer from F(y) and T = supp(x∗)

If wj = |x∗

j|−1, j ∈ T then

x(w) = x∗

USC, 2008 – p. 22/33

slide-103
SLIDE 103

Connections

Suppose x∗ ∈ I

RN is the ℓ1 minimizer from F(y) and T = supp(x∗)

If wj = |x∗

j|−1, j ∈ T then

x(w) = x∗

If x∗ has full support then ℓ1 minimization means that

(sign(xi))N

i=1 is orthogonal to N

USC, 2008 – p. 22/33

slide-104
SLIDE 104

Connections

Suppose x∗ ∈ I

RN is the ℓ1 minimizer from F(y) and T = supp(x∗)

If wj = |x∗

j|−1, j ∈ T then

x(w) = x∗

If x∗ has full support then ℓ1 minimization means that

(sign(xi))N

i=1 is orthogonal to N

For least squares we have wix∗

i = sign(x∗ i ) and so x∗

satisfies the weighted ℓ2 orthogonality conditions.

USC, 2008 – p. 22/33

slide-105
SLIDE 105

Connections

Suppose x∗ ∈ I

RN is the ℓ1 minimizer from F(y) and T = supp(x∗)

If wj = |x∗

j|−1, j ∈ T then

x(w) = x∗

If x∗ has full support then ℓ1 minimization means that

(sign(xi))N

i=1 is orthogonal to N

For least squares we have wix∗

i = sign(x∗ i ) and so x∗

satisfies the weighted ℓ2 orthogonality conditions. Unique minimizer for these problems shows x(w) = x∗

USC, 2008 – p. 22/33

slide-106
SLIDE 106

Iterative Weighted Least Squares

We would like to iteratively choose weights with the result convergence to the ℓ1 minimizer x∗ ∈ F(y)

USC, 2008 – p. 23/33

slide-107
SLIDE 107

Iterative Weighted Least Squares

We would like to iteratively choose weights with the result convergence to the ℓ1 minimizer x∗ ∈ F(y) In the process, we want to always deal with strictly convex problems and strictly positive weights

USC, 2008 – p. 23/33

slide-108
SLIDE 108

Iterative Weighted Least Squares

We would like to iteratively choose weights with the result convergence to the ℓ1 minimizer x∗ ∈ F(y) In the process, we want to always deal with strictly convex problems and strictly positive weights To do this we introduce the following functional

USC, 2008 – p. 23/33

slide-109
SLIDE 109

Iterative Weighted Least Squares

We would like to iteratively choose weights with the result convergence to the ℓ1 minimizer x∗ ∈ F(y) In the process, we want to always deal with strictly convex problems and strictly positive weights To do this we introduce the following functional

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

USC, 2008 – p. 23/33

slide-110
SLIDE 110

Iterative Weighted Least Squares

We would like to iteratively choose weights with the result convergence to the ℓ1 minimizer x∗ ∈ F(y) In the process, we want to always deal with strictly convex problems and strictly positive weights To do this we introduce the following functional

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

We will now describe a recursive algorithm

USC, 2008 – p. 23/33

slide-111
SLIDE 111

Iterative Weighted Least Squares

We would like to iteratively choose weights with the result convergence to the ℓ1 minimizer x∗ ∈ F(y) In the process, we want to always deal with strictly convex problems and strictly positive weights To do this we introduce the following functional

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

We will now describe a recursive algorithm If z ∈ I

RN we let r(z)K be its K-th largest entry (in

absolute value)

USC, 2008 – p. 23/33

slide-112
SLIDE 112

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

USC, 2008 – p. 24/33

slide-113
SLIDE 113

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

Initialize: w0 := (1, . . . , 1),

ǫ0 := 1

USC, 2008 – p. 24/33

slide-114
SLIDE 114

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

Initialize: w0 := (1, . . . , 1),

ǫ0 := 1 xm+1 := Argmin

z∈F(y)

J (z, wm, ǫm), m = 0, 1, . . .

USC, 2008 – p. 24/33

slide-115
SLIDE 115

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

Initialize: w0 := (1, . . . , 1),

ǫ0 := 1 xm+1 := Argmin

z∈F(y)

J (z, wm, ǫm), m = 0, 1, . . . ǫm+1 := min(ǫm, r(xm+1)K

N

),

USC, 2008 – p. 24/33

slide-116
SLIDE 116

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

Initialize: w0 := (1, . . . , 1),

ǫ0 := 1 xm+1 := Argmin

z∈F(y)

J (z, wm, ǫm), m = 0, 1, . . . ǫm+1 := min(ǫm, r(xm+1)K

N

), K is a fixed integer to be described

USC, 2008 – p. 24/33

slide-117
SLIDE 117

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

Initialize: w0 := (1, . . . , 1),

ǫ0 := 1 xm+1 := Argmin

z∈F(y)

J (z, wm, ǫm), m = 0, 1, . . . ǫm+1 := min(ǫm, r(xm+1)K

N

), K is a fixed integer to be described wm+1 := Argmin

w>0

J (xm+1, w, ǫm+1)

USC, 2008 – p. 24/33

slide-118
SLIDE 118

The Algorithm

J (z, w, ǫ) := 1 2  

N

  • j=1

z2

j wj + N

  • j=1

(ǫ2wj + w−1

j )

  .

Initialize: w0 := (1, . . . , 1),

ǫ0 := 1 xm+1 := Argmin

z∈F(y)

J (z, wm, ǫm), m = 0, 1, . . . ǫm+1 := min(ǫm, r(xm+1)K

N

), K is a fixed integer to be described wm+1 := Argmin

w>0

J (xm+1, w, ǫm+1) wm+1

j

= [(xm+1

j

)2 + ǫ2

m+1]−1/2,

j = 1, . . . , N

USC, 2008 – p. 24/33

slide-119
SLIDE 119

Daubechies-DeVore-Fornasier-Gunturk

These authors prove several results on the convergence of the above algorithm

USC, 2008 – p. 25/33

slide-120
SLIDE 120

Daubechies-DeVore-Fornasier-Gunturk

These authors prove several results on the convergence of the above algorithm The main result we prove is the following theorem Theorem Let k ≥ 1 and define K = k + 6. We assume that Φ satisfies the Null Space Property for ℓ1 of order 3K for

γ ≤ 1/2. Then, for each x ∈ I RN, y = Φ(x), the Algorithm

converges and its limit ¯

x satisfies x − ¯ xℓ1 ≤ C1σk(x)ℓ1, C1 := 5(1 + γ) 1 − γ .

In particular if x is k-sparse then xm converges to x.

USC, 2008 – p. 25/33

slide-121
SLIDE 121

Exponential Convergence

A second theorem shows that the algorithm converges exponentially to x∗ if x∗ ∈ Σk and the starting point is close enough Theorem For a given 0 < ρ < 1, assume Φ satisfies NSP of order 3K with constant γ such that

µ :=

γ 1−ρ

  • 1 +

1 K−k

  • < 1. Let m0 be such that

xm0 − x∗ℓ1 ≤ R∗ := ρ min

i∈T |xi| = ρ r(x)k.

Then for all m ≥ m0, we have

xm+1 − x∗ℓ1 ≤ µxm − x∗ℓ1

Consequently xm converges to x∗ exponentially.

USC, 2008 – p. 26/33

slide-122
SLIDE 122

Exponential Convergence

A second theorem shows that the algorithm converges exponentially to x∗ if x∗ ∈ Σk and the starting point is close enough Theorem For a given 0 < ρ < 1, assume Φ satisfies NSP of order 3K with constant γ such that

µ :=

γ 1−ρ

  • 1 +

1 K−k

  • < 1. Let m0 be such that

xm0 − x∗ℓ1 ≤ R∗ := ρ min

i∈T |xi| = ρ r(x)k.

Then for all m ≥ m0, we have

xm+1 − x∗ℓ1 ≤ µxm − x∗ℓ1

Consequently xm converges to x∗ exponentially.

USC, 2008 – p. 26/33

slide-123
SLIDE 123

Exponential Convergence 2

It is an interesting question whether the algorithm actually converges exponentially to x∗ ∈ Σk from the get go

USC, 2008 – p. 27/33

slide-124
SLIDE 124

Exponential Convergence 2

It is an interesting question whether the algorithm actually converges exponentially to x∗ ∈ Σk from the get go This is observed in practice but only proved for one or two supported vectors Theorem Assume Φ satisfies NSP of order 3K with constant γ sufficiently small. Then for any vector x∗ which has support k = 1, 2 there is an absolute constant

C0 and a ρ < 1 such that xm − x∗ℓ1 ≤ C0ρmxℓ1, m = 1, 2, . . .

USC, 2008 – p. 27/33

slide-125
SLIDE 125

Proofs of these Results

The proofs of these results are interesting in how they utilize RIP in the form of the Null Space Property

USC, 2008 – p. 28/33

slide-126
SLIDE 126

Proofs of these Results

The proofs of these results are interesting in how they utilize RIP in the form of the Null Space Property We shall begin with the proof of the convergence of the algorithm

USC, 2008 – p. 28/33

slide-127
SLIDE 127

Proofs of these Results

The proofs of these results are interesting in how they utilize RIP in the form of the Null Space Property We shall begin with the proof of the convergence of the algorithm Before embarking on the proof we want to bring out a certain geometric result on ℓ1 minimization which we shall utilize

USC, 2008 – p. 28/33

slide-128
SLIDE 128

Proofs of these Results

The proofs of these results are interesting in how they utilize RIP in the form of the Null Space Property We shall begin with the proof of the convergence of the algorithm Before embarking on the proof we want to bring out a certain geometric result on ℓ1 minimization which we shall utilize As a preliminary we consider the operation of rearrangements

USC, 2008 – p. 28/33

slide-129
SLIDE 129

Proofs of these Results

The proofs of these results are interesting in how they utilize RIP in the form of the Null Space Property We shall begin with the proof of the convergence of the algorithm Before embarking on the proof we want to bring out a certain geometric result on ℓ1 minimization which we shall utilize As a preliminary we consider the operation of rearrangements We define r(z) as the rearrangement of the sequence

|zj| into decreasing order. In other words r(z)j is the j-th largest of the |zν|

USC, 2008 – p. 28/33

slide-130
SLIDE 130

Rearrangements

Rearrangement is a Lipschitz map on · ℓ∞

USC, 2008 – p. 29/33

slide-131
SLIDE 131

Rearrangements

Rearrangement is a Lipschitz map on · ℓ∞ More precisely r(z) − r(z′)ℓ∞ ≤ z − z′ℓ∞

USC, 2008 – p. 29/33

slide-132
SLIDE 132

Rearrangements

Rearrangement is a Lipschitz map on · ℓ∞ More precisely r(z) − r(z′)ℓ∞ ≤ z − z′ℓ∞ Moreover, for any j, we have

|σj(z)ℓ1 − σj(z′)ℓ1| ≤ z − z′ℓ1

USC, 2008 – p. 29/33

slide-133
SLIDE 133

Rearrangements

Rearrangement is a Lipschitz map on · ℓ∞ More precisely r(z) − r(z′)ℓ∞ ≤ z − z′ℓ∞ Moreover, for any j, we have

|σj(z)ℓ1 − σj(z′)ℓ1| ≤ z − z′ℓ1

For any J > j, we have

(J − j)r(z)J ≤ z − z′ℓ1 + σj(z′)ℓ1

USC, 2008 – p. 29/33

slide-134
SLIDE 134

Proof of Rearrangement Lemma

Given z, z′ and j ∈ {1, . . . , N}, let Λ be a set corresponding to the j − 1 largest entries in z′

USC, 2008 – p. 30/33

slide-135
SLIDE 135

Proof of Rearrangement Lemma

Given z, z′ and j ∈ {1, . . . , N}, let Λ be a set corresponding to the j − 1 largest entries in z′

r(z)j ≤ max

i∈Λc |zi| ≤ max i∈Λc |z′ i| + z − z′ℓ∞ ≤ r(z′)j + z − z′ℓ∞

USC, 2008 – p. 30/33

slide-136
SLIDE 136

Proof of Rearrangement Lemma

Given z, z′ and j ∈ {1, . . . , N}, let Λ be a set corresponding to the j − 1 largest entries in z′

r(z)j ≤ max

i∈Λc |zi| ≤ max i∈Λc |z′ i| + z − z′ℓ∞ ≤ r(z′)j + z − z′ℓ∞

Reverse the roles of z and z′

USC, 2008 – p. 30/33

slide-137
SLIDE 137

Proof of Rearrangement Lemma

Given z, z′ and j ∈ {1, . . . , N}, let Λ be a set corresponding to the j − 1 largest entries in z′

r(z)j ≤ max

i∈Λc |zi| ≤ max i∈Λc |z′ i| + z − z′ℓ∞ ≤ r(z′)j + z − z′ℓ∞

Reverse the roles of z and z′ Next we approximate z by the j-term approximation u of

z′ in ℓ1 σj(z)ℓ1 ≤ z − uℓ1 ≤ z − z′ℓ1 + σj(z′)ℓ1,

USC, 2008 – p. 30/33

slide-138
SLIDE 138

Proof of Rearrangement Lemma

Given z, z′ and j ∈ {1, . . . , N}, let Λ be a set corresponding to the j − 1 largest entries in z′

r(z)j ≤ max

i∈Λc |zi| ≤ max i∈Λc |z′ i| + z − z′ℓ∞ ≤ r(z′)j + z − z′ℓ∞

Reverse the roles of z and z′ Next we approximate z by the j-term approximation u of

z′ in ℓ1 σj(z)ℓ1 ≤ z − uℓ1 ≤ z − z′ℓ1 + σj(z′)ℓ1,

Again reverse roles of z, z′

USC, 2008 – p. 30/33

slide-139
SLIDE 139

Proof of Rearrangement Lemma

Given z, z′ and j ∈ {1, . . . , N}, let Λ be a set corresponding to the j − 1 largest entries in z′

r(z)j ≤ max

i∈Λc |zi| ≤ max i∈Λc |z′ i| + z − z′ℓ∞ ≤ r(z′)j + z − z′ℓ∞

Reverse the roles of z and z′ Next we approximate z by the j-term approximation u of

z′ in ℓ1 σj(z)ℓ1 ≤ z − uℓ1 ≤ z − z′ℓ1 + σj(z′)ℓ1,

Again reverse roles of z, z′

(J − j)r(z)J ≤ σj(z)ℓ1

USC, 2008 – p. 30/33

slide-140
SLIDE 140

A Geometric Property

Assume that NSP holds for some k and γ < 1

USC, 2008 – p. 31/33

slide-141
SLIDE 141

A Geometric Property

Assume that NSP holds for some k and γ < 1 For any z, z′ ∈ F(y)

z′ − zℓ1 ≤ 1 + γ 1 − γ

  • z′ℓ1 − zℓ1 + 2σk(z)ℓ1
  • .

USC, 2008 – p. 31/33

slide-142
SLIDE 142

Proof of Geometric Property

T the set of indices of the k largest entries in z (z′ − z)T cℓ1 ≤ z′

T cℓ1 + zT cℓ1

= z′ℓ1 − z′

T ℓ1 + σk(z)ℓ1

= zℓ1 + z′ℓ1 − zℓ1 − z′

T ℓ1 + σk(z)ℓ1

≤ zT ℓ1 − z′

T ℓ1 + z′ℓ1 − zℓ1 + 2σk(z)ℓ1

≤ (z′ − z)Tℓ1 + z′ℓ1 − zℓ1 + 2σk(z)ℓ1

USC, 2008 – p. 32/33

slide-143
SLIDE 143

Proof of Geometric Property

T the set of indices of the k largest entries in z (z′ − z)T cℓ1 ≤ z′

T cℓ1 + zT cℓ1

= z′ℓ1 − z′

T ℓ1 + σk(z)ℓ1

= zℓ1 + z′ℓ1 − zℓ1 − z′

T ℓ1 + σk(z)ℓ1

≤ zT ℓ1 − z′

T ℓ1 + z′ℓ1 − zℓ1 + 2σk(z)ℓ1

≤ (z′ − z)Tℓ1 + z′ℓ1 − zℓ1 + 2σk(z)ℓ1

Using NSP (z′ − z)T ℓ1

≤ γ(z′−z)T cℓ1 ≤ γ((z′−z)Tℓ1+z′ℓ1−zℓ1+2σk(z)ℓ1)

USC, 2008 – p. 32/33

slide-144
SLIDE 144

Proof Continued

In other words,

(z′ − z)T ℓ1 ≤ γ 1 − γ (z′ℓ1 − zℓ1 + 2σk(z)ℓ1).

USC, 2008 – p. 33/33

slide-145
SLIDE 145

Proof Continued

In other words,

(z′ − z)T ℓ1 ≤ γ 1 − γ (z′ℓ1 − zℓ1 + 2σk(z)ℓ1).

Finally

z′ − zℓ1 = (z′ − z)T cℓ1 + (z′ − z)Tℓ1 ≤ 1 + γ 1 − γ (z′ℓ1 − zℓ1 + 2σk(z)ℓ1),

USC, 2008 – p. 33/33