Sparse Solutions of Underdetermined Linear Equations by Linear - - PowerPoint PPT Presentation

sparse solutions of underdetermined linear equations by
SMART_READER_LITE
LIVE PREVIEW

Sparse Solutions of Underdetermined Linear Equations by Linear - - PowerPoint PPT Presentation

Sparse Solutions of Underdetermined Linear Equations by Linear Programming David Donoho & Jared Tanner Stanford University, Department of Statistics University of Utah, Department of Mathematics Arizona State University: March 6 th 2006


slide-1
SLIDE 1

Sparse Solutions of Underdetermined Linear Equations by Linear Programming

David Donoho & Jared Tanner

Stanford University, Department of Statistics University of Utah, Department of Mathematics

Arizona State University: March 6th 2006

slide-2
SLIDE 2

Underdetermined systems, dictionary perspective

◮ Underdetermined system, infinite number of solutions

Ax = b, A ∈ Rd×n, d < n

slide-3
SLIDE 3

Underdetermined systems, dictionary perspective

◮ Underdetermined system, infinite number of solutions

Ax = b, A ∈ Rd×n, d < n

◮ Least squares solution via “canonical dual” (ATA)−1AT

  • Linear reconstruction, not signal adaptive
  • Solution vector full, n nonzero elements in x
slide-4
SLIDE 4

Underdetermined systems, dictionary perspective

◮ Underdetermined system, infinite number of solutions

Ax = b, A ∈ Rd×n, d < n

◮ Least squares solution via “canonical dual” (ATA)−1AT

  • Linear reconstruction, not signal adaptive
  • Solution vector full, n nonzero elements in x

◮ Eschew redundancy, find simple model of data from A

slide-5
SLIDE 5

Underdetermined systems, dictionary perspective

◮ Underdetermined system, infinite number of solutions

Ax = b, A ∈ Rd×n, d < n

◮ Least squares solution via “canonical dual” (ATA)−1AT

  • Linear reconstruction, not signal adaptive
  • Solution vector full, n nonzero elements in x

◮ Eschew redundancy, find simple model of data from A ◮ Seek sparsest solution, xℓ0 := # nonzero elements

min xℓ0 subject to Ax = b

slide-6
SLIDE 6

Underdetermined systems, dictionary perspective

◮ Underdetermined system, infinite number of solutions

Ax = b, A ∈ Rd×n, d < n

◮ Least squares solution via “canonical dual” (ATA)−1AT

  • Linear reconstruction, not signal adaptive
  • Solution vector full, n nonzero elements in x

◮ Eschew redundancy, find simple model of data from A ◮ Seek sparsest solution, xℓ0 := # nonzero elements

min xℓ0 subject to Ax = b

◮ Combinatorial cost for naive approach

slide-7
SLIDE 7

Underdetermined systems, dictionary perspective

◮ Underdetermined system, infinite number of solutions

Ax = b, A ∈ Rd×n, d < n

◮ Least squares solution via “canonical dual” (ATA)−1AT

  • Linear reconstruction, not signal adaptive
  • Solution vector full, n nonzero elements in x

◮ Eschew redundancy, find simple model of data from A ◮ Seek sparsest solution, xℓ0 := # nonzero elements

min xℓ0 subject to Ax = b

◮ Combinatorial cost for naive approach ◮ Efficient nonlinear (signal adaptive) methods

  • Greedy (local) and Basis Pursuit (global)
slide-8
SLIDE 8

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

slide-9
SLIDE 9

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj]

slide-10
SLIDE 10

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

slide-11
SLIDE 11

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

slide-12
SLIDE 12

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

◮ Highly redundant dictionary often give fast decay of residual

slide-13
SLIDE 13

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

◮ Highly redundant dictionary often give fast decay of residual ◮ Recovery of sparsest solution?

  • examples of arbitrary sub-optimality for a fixed dictionary A

[Temlyakov, DeVore, S. Chen, Tropp, . . . ]

slide-14
SLIDE 14

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

◮ Highly redundant dictionary often give fast decay of residual ◮ Recovery of sparsest solution?

  • examples of arbitrary sub-optimality for a fixed dictionary A

[Temlyakov, DeVore, S. Chen, Tropp, . . . ]

  • residual nonzero for steps < d, irregardless of sparsity [Chen]
slide-15
SLIDE 15

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

◮ Highly redundant dictionary often give fast decay of residual ◮ Recovery of sparsest solution?

  • examples of arbitrary sub-optimality for a fixed dictionary A

[Temlyakov, DeVore, S. Chen, Tropp, . . . ]

  • residual nonzero for steps < d, irregardless of sparsity [Chen]

◮ Recover sparsest if sufficiently sparse, O(

√ d) [Tropp]

slide-16
SLIDE 16

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

◮ Highly redundant dictionary often give fast decay of residual ◮ Recovery of sparsest solution?

  • examples of arbitrary sub-optimality for a fixed dictionary A

[Temlyakov, DeVore, S. Chen, Tropp, . . . ]

  • residual nonzero for steps < d, irregardless of sparsity [Chen]

◮ Recover sparsest if sufficiently sparse, O(

√ d) [Tropp]

◮ More sophisticated variants; weak greedy, swapping, etc ...

slide-17
SLIDE 17

Greedy [Temlyakov, DeVore, Tropp, ...]

◮ Orthogonal Matching Pursuit: initial r = b, ˜

A = [] while r = 0 max

ℓ∞ ATr =: aT j r

˜ A = [˜ A aj] r = b − A(˜ AT ˜ A)−1˜ ATb

◮ Nonlinear selection of basis, x = (˜

AT ˜ A)−1˜ ATb; xℓ0 ≤ d

◮ Highly redundant dictionary often give fast decay of residual ◮ Recovery of sparsest solution?

  • examples of arbitrary sub-optimality for a fixed dictionary A

[Temlyakov, DeVore, S. Chen, Tropp, . . . ]

  • residual nonzero for steps < d, irregardless of sparsity [Chen]

◮ Recover sparsest if sufficiently sparse, O(

√ d) [Tropp]

◮ More sophisticated variants; weak greedy, swapping, etc ... ◮ More about OMP for random sampling later

slide-18
SLIDE 18

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection
slide-19
SLIDE 19

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection

◮ Example, A = [A1A2] two ONB with coherence maxij(ai, aj)

  • If xℓ0 .914(1 + µ−1) then ℓ1 → ℓ0 [Elad, Bruckstein]
  • Coherence, µ := maxij(ai, aj) ≥ 1/

√ d, [Candes, Romberg]

slide-20
SLIDE 20

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection

◮ Example, A = [A1A2] two ONB with coherence maxij(ai, aj)

  • If xℓ0 .914(1 + µ−1) then ℓ1 → ℓ0 [Elad, Bruckstein]
  • Coherence, µ := maxij(ai, aj) ≥ 1/

√ d, [Candes, Romberg]

◮ Ensures convergence for only the most sparse, O(

√ d), signals.

slide-21
SLIDE 21

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection

◮ Example, A = [A1A2] two ONB with coherence maxij(ai, aj)

  • If xℓ0 .914(1 + µ−1) then ℓ1 → ℓ0 [Elad, Bruckstein]
  • Coherence, µ := maxij(ai, aj) ≥ 1/

√ d, [Candes, Romberg]

◮ Ensures convergence for only the most sparse, O(

√ d), signals.

◮ Examples of failure: Dirac’s Comb [Candes, . . .]

slide-22
SLIDE 22

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection

◮ Example, A = [A1A2] two ONB with coherence maxij(ai, aj)

  • If xℓ0 .914(1 + µ−1) then ℓ1 → ℓ0 [Elad, Bruckstein]
  • Coherence, µ := maxij(ai, aj) ≥ 1/

√ d, [Candes, Romberg]

◮ Ensures convergence for only the most sparse, O(

√ d), signals.

◮ Examples of failure: Dirac’s Comb [Candes, . . .] ◮ Is the story over? Can O(

√ d) threshold be overcome? yes!

slide-23
SLIDE 23

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection

◮ Example, A = [A1A2] two ONB with coherence maxij(ai, aj)

  • If xℓ0 .914(1 + µ−1) then ℓ1 → ℓ0 [Elad, Bruckstein]
  • Coherence, µ := maxij(ai, aj) ≥ 1/

√ d, [Candes, Romberg]

◮ Ensures convergence for only the most sparse, O(

√ d), signals.

◮ Examples of failure: Dirac’s Comb [Candes, . . .] ◮ Is the story over? Can O(

√ d) threshold be overcome? yes!

◮ Examples of success: partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋

slide-24
SLIDE 24

Basis Pursuit

◮ Rather than solve ℓ0 (combinatorial), solve ℓ1, (use LP)

min xℓ1 subject to Ax = b

  • Global basis selection rather than greedy local selection

◮ Example, A = [A1A2] two ONB with coherence maxij(ai, aj)

  • If xℓ0 .914(1 + µ−1) then ℓ1 → ℓ0 [Elad, Bruckstein]
  • Coherence, µ := maxij(ai, aj) ≥ 1/

√ d, [Candes, Romberg]

◮ Ensures convergence for only the most sparse, O(

√ d), signals.

◮ Examples of failure: Dirac’s Comb [Candes, . . .] ◮ Is the story over? Can O(

√ d) threshold be overcome? yes!

◮ Examples of success: partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋

◮ More to come for typical (random) matrices

slide-25
SLIDE 25

Sparsity threshold and the sampling matrix, A

Deterministic:

◮ Uncorrelated frame expansions require xℓ0 < O(

√ d) that is, success only for highly sparse signals:

◮ Some special cases of success,

partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋ [Donoho, T]

slide-26
SLIDE 26

Sparsity threshold and the sampling matrix, A

Deterministic:

◮ Uncorrelated frame expansions require xℓ0 < O(

√ d) that is, success only for highly sparse signals:

◮ Some special cases of success,

partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋ [Donoho, T]

Random:

◮ Overcoming the

√ d threshold for most A with randomness

slide-27
SLIDE 27

Sparsity threshold and the sampling matrix, A

Deterministic:

◮ Uncorrelated frame expansions require xℓ0 < O(

√ d) that is, success only for highly sparse signals:

◮ Some special cases of success,

partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋ [Donoho, T]

Random:

◮ Overcoming the

√ d threshold for most A with randomness

◮ Recent order bounds for random ortho-projectors

  • ℓ1 → ℓ0 if xℓ0 O(d/ log(n/d))

[Candes, Tao, Romberg; Vershinin, Rudelson]

  • OMP→ ℓ0 if xℓ0 O(d/ log(n)) [Tropp]
slide-28
SLIDE 28

Sparsity threshold and the sampling matrix, A

Deterministic:

◮ Uncorrelated frame expansions require xℓ0 < O(

√ d) that is, success only for highly sparse signals:

◮ Some special cases of success,

partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋ [Donoho, T]

Random:

◮ Overcoming the

√ d threshold for most A with randomness

◮ Recent order bounds for random ortho-projectors

  • ℓ1 → ℓ0 if xℓ0 O(d/ log(n/d))

[Candes, Tao, Romberg; Vershinin, Rudelson]

  • OMP→ ℓ0 if xℓ0 O(d/ log(n)) [Tropp]

◮ What is the precise ℓ1 sparsity threshold for random matrices?

slide-29
SLIDE 29

Sparsity threshold and the sampling matrix, A

Deterministic:

◮ Uncorrelated frame expansions require xℓ0 < O(

√ d) that is, success only for highly sparse signals:

◮ Some special cases of success,

partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋ [Donoho, T]

Random:

◮ Overcoming the

√ d threshold for most A with randomness

◮ Recent order bounds for random ortho-projectors

  • ℓ1 → ℓ0 if xℓ0 O(d/ log(n/d))

[Candes, Tao, Romberg; Vershinin, Rudelson]

  • OMP→ ℓ0 if xℓ0 O(d/ log(n)) [Tropp]

◮ What is the precise ℓ1 sparsity threshold for random matrices? ◮ Computing random inner products, “correlation with noise”

slide-30
SLIDE 30

Sparsity threshold and the sampling matrix, A

Deterministic:

◮ Uncorrelated frame expansions require xℓ0 < O(

√ d) that is, success only for highly sparse signals:

◮ Some special cases of success,

partial Fourier and Laplace, xℓ0 ⌊d

2 ⌋ [Donoho, T]

Random:

◮ Overcoming the

√ d threshold for most A with randomness

◮ Recent order bounds for random ortho-projectors

  • ℓ1 → ℓ0 if xℓ0 O(d/ log(n/d))

[Candes, Tao, Romberg; Vershinin, Rudelson]

  • OMP→ ℓ0 if xℓ0 O(d/ log(n)) [Tropp]

◮ What is the precise ℓ1 sparsity threshold for random matrices? ◮ Computing random inner products, “correlation with noise”

Why solve this problem? Are there applications?

slide-31
SLIDE 31

Motivation for systems with random A

Compressed Sensing [Donoho; Candes, Tao]:

◮ Transform Φ with sparse signal coefficients, ˆ

x = Φx. Can ˆ x be recovered with few measurements of x?

slide-32
SLIDE 32

Motivation for systems with random A

Compressed Sensing [Donoho; Candes, Tao]:

◮ Transform Φ with sparse signal coefficients, ˆ

x = Φx. Can ˆ x be recovered with few measurements of x?

◮ Yes, from nonadaptive measurements recover sparse coef.

Sample the signal with AΦ where A is random d × n, d < n. min Φxℓ1 subject to measurements AΦx = b

slide-33
SLIDE 33

Motivation for systems with random A

Compressed Sensing [Donoho; Candes, Tao]:

◮ Transform Φ with sparse signal coefficients, ˆ

x = Φx. Can ˆ x be recovered with few measurements of x?

◮ Yes, from nonadaptive measurements recover sparse coef.

Sample the signal with AΦ where A is random d × n, d < n. min Φxℓ1 subject to measurements AΦx = b

◮ Coming to a digital camera near you [Baranuik]

slide-34
SLIDE 34

Motivation for systems with random A

Compressed Sensing [Donoho; Candes, Tao]:

◮ Transform Φ with sparse signal coefficients, ˆ

x = Φx. Can ˆ x be recovered with few measurements of x?

◮ Yes, from nonadaptive measurements recover sparse coef.

Sample the signal with AΦ where A is random d × n, d < n. min Φxℓ1 subject to measurements AΦx = b

◮ Coming to a digital camera near you [Baranuik]

Phase transition as function of measurements (aspect ratio):

◮ Fix aspect ratio, δ = d/n ∈ (0, 1), where A ∈ Rd×n

Sparsity threshold, xℓ0 ≤ ρ(δ)d, ρ(δ) ∈ (0, 1)

slide-35
SLIDE 35

Motivation for systems with random A

Compressed Sensing [Donoho; Candes, Tao]:

◮ Transform Φ with sparse signal coefficients, ˆ

x = Φx. Can ˆ x be recovered with few measurements of x?

◮ Yes, from nonadaptive measurements recover sparse coef.

Sample the signal with AΦ where A is random d × n, d < n. min Φxℓ1 subject to measurements AΦx = b

◮ Coming to a digital camera near you [Baranuik]

Phase transition as function of measurements (aspect ratio):

◮ Fix aspect ratio, δ = d/n ∈ (0, 1), where A ∈ Rd×n

Sparsity threshold, xℓ0 ≤ ρ(δ)d, ρ(δ) ∈ (0, 1)

◮ Phase transition as n → ∞, overwhelming probability ℓ1 → ℓ0

slide-36
SLIDE 36

Neighborliness and constrained ℓ1 minimization

Theorem

Let A be a d ×n matrix, d < n. The two properties of A are equiv.:

◮ The polytope AT has n vertices and is outwardly k-neighborly, ◮ Whenever y = Ax has a nonnegative solution x0 having at

most k nonzeros, x0 is the unique nonnegative solution to y = Ax and so the unique solution to the constrained ℓ1 minimization problem.

slide-37
SLIDE 37

Neighborliness and constrained ℓ1 minimization

Theorem

Let A be a d ×n matrix, d < n. The two properties of A are equiv.:

◮ The polytope AT has n vertices and is outwardly k-neighborly, ◮ Whenever y = Ax has a nonnegative solution x0 having at

most k nonzeros, x0 is the unique nonnegative solution to y = Ax and so the unique solution to the constrained ℓ1 minimization problem.

Lemma (Neighborliness and face numbers)

Suppose the polytope P = AT has n vertices and is outwardly k-neighborly. Then ∀ℓ = 0, . . . , k − 1, ∀ F ∈ Fℓ(T n−1), AF ∈ Fℓ(AT). Conversely, suppose that the above equation holds; then P = AT has n vertices and is outwardly k-neighborly.

slide-38
SLIDE 38

Strong threshold, random A and all x0

Expected number of faces, random ortho-projector: Efk(AT) = fk(T) − 2

  • s≥0
  • F∈Fk(T)
  • G∈Fd+1+2s(T)

β(F, G)γ(G, T) where β and γ are internal and external angles respectively [Affentranger, Schneider]

slide-39
SLIDE 39

Strong threshold, random A and all x0

Expected number of faces, random ortho-projector: Efk(AT) = fk(T) − 2

  • s≥0
  • F∈Fk(T)
  • G∈Fd+1+2s(T)

β(F, G)γ(G, T) where β and γ are internal and external angles respectively [Affentranger, Schneider]

Theorem (Strong threshold)

Let ρ < ρN(δ) and let A = Ad,n be a uniformly-distributed random projection from Rn to Rd, with d ≥ δn. Then Prob{fℓ(AT n−1) = fℓ(T n−1), ℓ = 0, . . . , ⌊ρd⌋} → 1, as n → ∞. = ⇒ P is k neighborly for k = ⌊(ρN(δ) − ǫ)d⌋

slide-40
SLIDE 40

Strong threshold, random A and all x0

Expected number of faces, random ortho-projector: Efk(AT) = fk(T) − 2

  • s≥0
  • F∈Fk(T)
  • G∈Fd+1+2s(T)

β(F, G)γ(G, T) where β and γ are internal and external angles respectively [Affentranger, Schneider]

Theorem (Strong threshold)

Let ρ < ρN(δ) and let A = Ad,n be a uniformly-distributed random projection from Rn to Rd, with d ≥ δn. Then Prob{fℓ(AT n−1) = fℓ(T n−1), ℓ = 0, . . . , ⌊ρd⌋} → 1, as n → ∞. = ⇒ P is k neighborly for k = ⌊(ρN(δ) − ǫ)d⌋ ⇒ With overwhelming probability (fℓ(T n−1) − Efℓ(AT n−1) ≤ πne−ǫn)

  • n A, for every x0 with x0ℓ0 ≤ ⌊(ρN(δ) − ǫ)d⌋, y = Ax0

generates an instance of the constrained ℓ1 minimization problem with x0 as its unique solution.

slide-41
SLIDE 41

Phase Transition, Strong (non-negative)

ℓ1 → ℓ0 if x0ℓ0 ≤ ⌊(ρN(δ) − ǫ)d⌋ and x ≥ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ρN(δ) δ

◮ As δ ↑ 1 ρN(δ) ≈ .371 ◮ As δ → 0

ρN(δ) ∼ [2e log(1/δ)]−1

slide-42
SLIDE 42

Weak threshold, random A and most x0

Theorem (Vershik-Sporyshev)

Let d = d(n) ∼ δn and let A = Ad,n be a uniform random projection from Rn to Rd. Then for a sequence k = k(n) with k/d ∼ ρ, ρ < ρVS(δ), we have fk(AT n−1) = fk(T n−1)(1 + oP(1)).

slide-43
SLIDE 43

Weak threshold, random A and most x0

Theorem (Vershik-Sporyshev)

Let d = d(n) ∼ δn and let A = Ad,n be a uniform random projection from Rn to Rd. Then for a sequence k = k(n) with k/d ∼ ρ, ρ < ρVS(δ), we have fk(AT n−1) = fk(T n−1)(1 + oP(1)).

Theorem

Let A be a d × n matrix, d < n in general position. For 1 ≤ k ≤ d − 1, these two properties of A are equivalent

◮ The polytope P = AT has at least (1 − ǫ) times as many

zero-free (k − 1)-faces as T,

◮ Among all problem instances (y, A) generated by some

nonnegative vector x0 with at most k nonzeros, the constrained ℓ1 minimization recovers the sparsest solution, except in a fraction ≤ ǫ of instances.

slide-44
SLIDE 44

Phase Transition, Weak (non-negative)

ℓ1 → ℓ0 if xℓ0 ≤ ⌊(ρVS(δ) − ǫ)d⌋ and x ≥ 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ ρvs(δ)

◮ Asymptotic limit of empirical tests (example shown later) ◮ As δ → 0 ρVS(δ) ∼ [2 log(1/δ)]−1 ◮ Typically e times less strict sparsity requirement as δ → 0

slide-45
SLIDE 45

Phase Transitions, ℓ1 → ℓ0 if xℓ0 < ρ(δ)d

Two modalities from the random sampling perspective:

◮ Weak threshold, random signal and measurement independent ◮ Strong threshold, worst signal for a given measurement

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ ρ(δ)

Weak (typical) and Strong (malicious) transition Non-negative signal, x, [Donoho,T]

slide-46
SLIDE 46

Phase Transitions, ℓ1 → ℓ0 if xℓ0 < ρ(δ)d

Two modalities from the random sampling perspective:

◮ Weak threshold, random signal and measurement independent ◮ Strong threshold, worst signal for a given measurement

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ ρ(δ)

Weak (typical) and Strong (malicious) transition Solid for non-negative, dashed for signed signal [Donoho]

slide-47
SLIDE 47

Some precise numbers and implications

δ = .1 δ = .25 δ = .5 δ = .75 δ = .9 ρ+

N

.060131 .087206 .133457 .198965 .266558 ρ+

W

.240841 .364970 .558121 .765796 .902596 ρ±

N

.048802 .065440 .089416 .117096 .140416 ρ±

W

.188327 .266437 .384803 .532781 .677258

◮ For most A measure 1/10 of a non-negative signal, recovery

every signal if 6% sparse, most if 24% sparse.

slide-48
SLIDE 48

Some precise numbers and implications

δ = .1 δ = .25 δ = .5 δ = .75 δ = .9 ρ+

N

.060131 .087206 .133457 .198965 .266558 ρ+

W

.240841 .364970 .558121 .765796 .902596 ρ±

N

.048802 .065440 .089416 .117096 .140416 ρ±

W

.188327 .266437 .384803 .532781 .677258

◮ For most A measure 1/10 of a non-negative signal, recovery

every signal if 6% sparse, most if 24% sparse.

◮ Half ’under-sampling’, i.e., δ = 1/2: apply ℓ1, if non-negative

solution is < 55% sparse then typically is sparsest solution.

slide-49
SLIDE 49

Some precise numbers and implications

δ = .1 δ = .25 δ = .5 δ = .75 δ = .9 ρ+

N

.060131 .087206 .133457 .198965 .266558 ρ+

W

.240841 .364970 .558121 .765796 .902596 ρ±

N

.048802 .065440 .089416 .117096 .140416 ρ±

W

.188327 .266437 .384803 .532781 .677258

◮ For most A measure 1/10 of a non-negative signal, recovery

every signal if 6% sparse, most if 24% sparse.

◮ Half ’under-sampling’, i.e., δ = 1/2: apply ℓ1, if non-negative

solution is < 55% sparse then typically is sparsest solution.

◮ Encode (1 − δ)n bits of info in signal of length n. Can recover

with less than δρ±

W (δ)n accidental, δρ± N(δ)n malicious errors.

  • twice redun., tolerate 19% random error, 4.4% mallitous.
slide-50
SLIDE 50

Empirical verification of weak transitions

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 δ ρF 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Non-negative Signed signal n = 200, 40 × 40 mesh with 60 random tests per node.

slide-51
SLIDE 51

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k

slide-52
SLIDE 52

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1

slide-53
SLIDE 53

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT

slide-54
SLIDE 54

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0

slide-55
SLIDE 55

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0 ◮ if b in the interior of P = AT then ∃ xℓ1 < 1 with Ax = b.

slide-56
SLIDE 56

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0 ◮ if b in the interior of P = AT then ∃ xℓ1 < 1 with Ax = b. ◮ If j ≤ k faces of T n−1 remain in P then ℓ1 → ℓ0 for xℓ0 ≤ k

slide-57
SLIDE 57

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0 ◮ if b in the interior of P = AT then ∃ xℓ1 < 1 with Ax = b. ◮ If j ≤ k faces of T n−1 remain in P then ℓ1 → ℓ0 for xℓ0 ≤ k

slide-58
SLIDE 58

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0 ◮ if b in the interior of P = AT then ∃ xℓ1 < 1 with Ax = b. ◮ If j ≤ k faces of T n−1 remain in P then ℓ1 → ℓ0 for xℓ0 ≤ k

ρN:= Prob(fℓ(AT n−1) = fℓ(T n−1) : ℓ = 0, . . . , ⌊ρNd⌋) → 1 That is, fℓ(T n−1) − Efℓ(AT n−1) ≤ πne−ǫn

slide-59
SLIDE 59

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0 ◮ if b in the interior of P = AT then ∃ xℓ1 < 1 with Ax = b. ◮ If j ≤ k faces of T n−1 remain in P then ℓ1 → ℓ0 for xℓ0 ≤ k

ρN:= Prob(fℓ(AT n−1) = fℓ(T n−1) : ℓ = 0, . . . , ⌊ρNd⌋) → 1 That is, fℓ(T n−1) − Efℓ(AT n−1) ≤ πne−ǫn ρW := Efℓ(AT n−1) ≥ (1 − ǫ)fℓ(T n−1), ℓ = 0, . . . , ⌊ρW d⌋

slide-60
SLIDE 60

Ingredients of the proofs, non-negative

Proof (main ideas):

◮ Given x0 ≥ 0 with x0ℓ1 = 1 and x0ℓ0 = k ◮ x0 is on a k − 1 face (say F) of unit simplex, T n−1 ◮ b = Ax0 either on boundary (AF ∈ Fk(AT)), or inside AT ◮ if on boundary then unique for xℓ1 ≤ 1 and Ax = b, ℓ1 → ℓ0 ◮ if b in the interior of P = AT then ∃ xℓ1 < 1 with Ax = b. ◮ If j ≤ k faces of T n−1 remain in P then ℓ1 → ℓ0 for xℓ0 ≤ k

ρN:= Prob(fℓ(AT n−1) = fℓ(T n−1) : ℓ = 0, . . . , ⌊ρNd⌋) → 1 That is, fℓ(T n−1) − Efℓ(AT n−1) ≤ πne−ǫn ρW := Efℓ(AT n−1) ≥ (1 − ǫ)fℓ(T n−1), ℓ = 0, . . . , ⌊ρW d⌋ Robustness: Nearby sparse solution, Ax0 − b2 ≤ ǫ, then solve min x1,ǫℓ1 such that Ax1,ǫ − b2 ≤ ǫ Then x0 − x1,ǫ2 ≤ C(k, A)ǫ where k = x0ℓ0.

slide-61
SLIDE 61

Summary

◮ Underdetermined system, Ax = b with A ∈ Rd×n where d < n ◮ To obtain sparsest solution, xℓ0, solve constrained min xℓ1 ◮ Precise sparsity phase transitions, ρ(d/n) available for ℓ1 → ℓ0 ◮ That is, if xℓ0 < ρ(d/n) · d then min xℓ1 → min xℓ0 ◮ Surprisingly large transition, effectiveness of Basis Pursuit (ℓ1)

slide-62
SLIDE 62

Summary

◮ Underdetermined system, Ax = b with A ∈ Rd×n where d < n ◮ To obtain sparsest solution, xℓ0, solve constrained min xℓ1 ◮ Precise sparsity phase transitions, ρ(d/n) available for ℓ1 → ℓ0 ◮ That is, if xℓ0 < ρ(d/n) · d then min xℓ1 → min xℓ0 ◮ Surprisingly large transition, effectiveness of Basis Pursuit (ℓ1)

Associated Papers for non-negative case [Donoho, T]:

◮ Sparse Nonnegative Solution of Underdetermined Linear

Equations by Linear Programming, Proc. Nat. Acc. Sci.

◮ Neighborliness of Randomly-Projected Simplices in High

Dimensions, Proc. Nat. Acc. Sci.

  • see also work by Donoho; Candes, Romberg, Tao; Tropp

Thank you for your time