Constrained optimization DS-GA 1013 / MATH-GA 2824 - - PowerPoint PPT Presentation

constrained optimization
SMART_READER_LITE
LIVE PREVIEW

Constrained optimization DS-GA 1013 / MATH-GA 2824 - - PowerPoint PPT Presentation

Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained problems Analyzing


slide-1
SLIDE 1

Constrained optimization

DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda

slide-2
SLIDE 2

Compressed sensing Convex constrained problems Analyzing optimization-based methods

slide-3
SLIDE 3

Magnetic resonance imaging

2D DFT (magnitude) 2D DFT (log. of magnitude)

slide-4
SLIDE 4

Magnetic resonance imaging

Data: Samples from spectrum Problem: Sampling is time consuming (annoying, kids move . . . ) Images are compressible (sparse in wavelet basis) Can we recover compressible signals from less data?

slide-5
SLIDE 5

2D wavelet transform

slide-6
SLIDE 6

2D wavelet transform

slide-7
SLIDE 7

Full DFT matrix

slide-8
SLIDE 8

Full DFT matrix

slide-9
SLIDE 9

Regular subsampling

slide-10
SLIDE 10

Regular subsampling

slide-11
SLIDE 11

Random subsampling

slide-12
SLIDE 12

Random subsampling

slide-13
SLIDE 13

Toy example

slide-14
SLIDE 14

Regular subsampling

slide-15
SLIDE 15

Random subsampling

slide-16
SLIDE 16

Linear inverse problems

Linear inverse problem A x = y Linear measurements, A ∈ Rm×n

  • y[i] = Ai:,

x , 1 ≤ i ≤ n, Aim: Recover data signal x ∈ Rm from data y ∈ Rn We need n ≥ m, otherwise the problem is underdetermined If n < m there are infinite solutions x + w where w ∈ null (A)

slide-17
SLIDE 17

Sparse recovery

Aim: Recover sparse x from linear measurements A x = y When is the problem well posed? There shouldn’t be two sparse vectors x1 and x2 such that A x1 = A x2

slide-18
SLIDE 18

Spark

The spark of a matrix is the smallest subset of columns that is linearly dependent

slide-19
SLIDE 19

Spark

The spark of a matrix is the smallest subset of columns that is linearly dependent Let y := A x ∗, where A ∈ Rm×n, y ∈ Rn and x ∗ ∈ Rm is a sparse vector with s nonzero entries The vector x ∗ is the only vector with sparsity s consistent with the data, i.e. it is the solution of min

  • x ||

x||0 subject to A x = y for any choice of x ∗ if and only if spark (A) > 2s

slide-20
SLIDE 20

Proof

Equivalent statements

◮ For any

x ∗, x ∗ is the only vector with sparsity s consistent with the data

slide-21
SLIDE 21

Proof

Equivalent statements

◮ For any

x ∗, x ∗ is the only vector with sparsity s consistent with the data

◮ For any pair of s-sparse vectors

x1 and x2 A ( x1 − x2) =

slide-22
SLIDE 22

Proof

Equivalent statements

◮ For any

x ∗, x ∗ is the only vector with sparsity s consistent with the data

◮ For any pair of s-sparse vectors

x1 and x2 A ( x1 − x2) =

◮ For any pair of subsets of s indices T1 and T2

AT1∪T2 α = for any α ∈ R|T1∪T2|

slide-23
SLIDE 23

Proof

Equivalent statements

◮ For any

x ∗, x ∗ is the only vector with sparsity s consistent with the data

◮ For any pair of s-sparse vectors

x1 and x2 A ( x1 − x2) =

◮ For any pair of subsets of s indices T1 and T2

AT1∪T2 α = for any α ∈ R|T1∪T2|

◮ All submatrices with at most 2s columns have no nonzero vectors in

their null space

slide-24
SLIDE 24

Proof

Equivalent statements

◮ For any

x ∗, x ∗ is the only vector with sparsity s consistent with the data

◮ For any pair of s-sparse vectors

x1 and x2 A ( x1 − x2) =

◮ For any pair of subsets of s indices T1 and T2

AT1∪T2 α = for any α ∈ R|T1∪T2|

◮ All submatrices with at most 2s columns have no nonzero vectors in

their null space

◮ All submatrices with at most 2s columns are full rank

slide-25
SLIDE 25

Restricted-isometry property

Robust version of spark If two s-sparse vectors x1, x2 are far, then A x1, A x2 should be far The linear operator should preserve distances (be an isometry) when restricted to act upon sparse vectors

slide-26
SLIDE 26

Restricted-isometry property

A satisfies the restricted isometry property (RIP) with constant κs if (1 − κs) || x||2 ≤ ||A x||2 ≤ (1 + κs) || x||2 for any s-sparse vector x If A satisfies the RIP for a sparsity level 2s then for any s-sparse x1, x2 || y2 − y1||2

slide-27
SLIDE 27

Restricted-isometry property

A satisfies the restricted isometry property (RIP) with constant κs if (1 − κs) || x||2 ≤ ||A x||2 ≤ (1 + κs) || x||2 for any s-sparse vector x If A satisfies the RIP for a sparsity level 2s then for any s-sparse x1, x2 || y2 − y1||2 = A ( x1 − x2) ≥ (1 − κ2s) || x2 − x1||2

slide-28
SLIDE 28

Regular subsampling

slide-29
SLIDE 29

Regular subsampling

slide-30
SLIDE 30

Correlation with column 20

10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Correlation

slide-31
SLIDE 31

Random subsampling

slide-32
SLIDE 32

Random subsampling

slide-33
SLIDE 33

Correlation with column 20

10 20 30 40 50 60 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Correlation

slide-34
SLIDE 34

Restricted-isometry property

Deterministic matrices tend to not satisfy the RIP It is NP-hard to check if spark or RIP hold Random matrices satisfy RIP with high probability We prove it for Gaussian iid matrices, ideas in proof for random Fourier matrices are similar

slide-35
SLIDE 35

Restricted-isometry property for Gaussian matrices

Let A ∈ Rm×n be a random matrix with iid standard Gaussian entries

1 √mA satisfies the RIP for a constant κs with probability 1 − C2 n as long as

m ≥ C1s κ2

s

log n s

  • for two fixed constants C1, C2 > 0
slide-36
SLIDE 36

Proof

For a fixed support T of size s bounds follow from bounds on singular values of Gaussian matrices

slide-37
SLIDE 37

Singular values of m × s matrix, s = 100

20 40 60 80 100 0.5 1 1.5 i

σi √m

m/s 2 5 10 20 50 100 200

slide-38
SLIDE 38

Singular values of m × s matrix, s = 1000

200 400 600 800 1,000 0.5 1 1.5 i

σi √m

m/s 2 5 10 20 50 100 200

slide-39
SLIDE 39

Proof

For a fixed submatrix the singular values are bounded by √m (1 − κs) ≤ σs ≤ σ1 ≤ √m (1 + κs) with probability at least 1 − 2 12 κs s exp

  • −mκ2

s

32

  • For any vector

x with support T √ 1 − κs || x||2 ≤ 1 √m ||A x||2 ≤ √ 1 + κs || x||2

slide-40
SLIDE 40

Union bound

For any events S1, S2, . . . , Sn in a probability space P (∪iSi) ≤

n

  • i=1

P (Si) .

slide-41
SLIDE 41

Proof

Number of different supports of size s

slide-42
SLIDE 42

Proof

Number of different supports of size s n s

en s s

slide-43
SLIDE 43

Proof

Number of different supports of size s n s

en s s By the union bound √ 1 − κs || x||2 ≤ 1 √m ||A x||2 ≤ √ 1 + κs || x||2 holds for any s-sparse vector x with probability at least 1 − 2 en s s 12 κs s exp

  • −mκ2

s

32

  • = 1 − exp
  • log 2 + s + s log

n s

  • + s log

12 κs

  • − mκ2

s

2

  • ≤ 1 − C2

n as long as m ≥ C1s κ2

s

log n s

slide-44
SLIDE 44

Sparse recovery via ℓ1-norm minimization

ℓ0-“norm" minimization is intractable (As usual) we can minimize ℓ1 norm instead, estimate xℓ1 is the solution to min

  • x ||

x||1 subject to A x = y

slide-45
SLIDE 45

Minimum ℓ2-norm solution (regular subsampling)

10 20 30 40 50 60 70

  • 3
  • 2
  • 1

1 2 3 4 True Signal Minimum L2 Solution

slide-46
SLIDE 46

Minimum ℓ1-norm solution (regular subsampling)

10 20 30 40 50 60 70

  • 3
  • 2
  • 1

1 2 3 4 True Signal Minimum L1 Solution

slide-47
SLIDE 47

Minimum ℓ2-norm solution (random subsampling)

10 20 30 40 50 60 70

  • 3
  • 2
  • 1

1 2 3 4 True Signal Minimum L2 Solution

slide-48
SLIDE 48

Minimum ℓ1-norm solution (random subsampling)

10 20 30 40 50 60 70

  • 3
  • 2
  • 1

1 2 3 4 True Signal Minimum L1 Solution

slide-49
SLIDE 49

Geometric intuition

ℓ2 norm ℓ1 norm

slide-50
SLIDE 50

Sparse recovery via ℓ1-norm minimization

If the signal is sparse in a transform domain then min

  • x ||

c||1 subject to AW c = y If we want to recover the original c ∗ then AW should satisfy the RIP

slide-51
SLIDE 51

Sparse recovery via ℓ1-norm minimization

If the signal is sparse in a transform domain then min

  • x ||

c||1 subject to AW c = y If we want to recover the original c ∗ then AW should satisfy the RIP However, we might be fine with any c ′ such that A c ′ = x ∗

slide-52
SLIDE 52

Regular subsampling

slide-53
SLIDE 53

Minimum ℓ2-norm solution (regular subsampling)

slide-54
SLIDE 54

Minimum ℓ1-norm solution (regular subsampling)

slide-55
SLIDE 55

Random subsampling

slide-56
SLIDE 56

Minimum ℓ2-norm solution (random subsampling)

slide-57
SLIDE 57

Minimum ℓ1-norm solution (random subsampling)

slide-58
SLIDE 58

Compressed sensing Convex constrained problems Analyzing optimization-based methods

slide-59
SLIDE 59

Convex sets

A convex set S is any set such that for any x, y ∈ S and θ ∈ (0, 1) θ x + (1 − θ) y ∈ S The intersection of convex sets is convex

slide-60
SLIDE 60

Convex vs nonconvex

Nonconvex Convex

slide-61
SLIDE 61

Epigraph

f epi (f ) A function is convex if and only if its epigraph is convex

slide-62
SLIDE 62

Projection onto convex set

The projection of any vector x onto a non-empty closed convex set S PS ( x) := arg min

  • y∈S ||

x − y||2 exists and is unique

slide-63
SLIDE 63

Proof

Assume there are two distinct projections y1 = y2 Consider

  • y ′ :=

y1 + y2 2

  • y ′ belongs to S (why?)
slide-64
SLIDE 64

Proof

  • x −

y ′, y1 − y ′ =

  • x −

y1 + y2 2 , y1 − y1 + y2 2

  • =
  • x −

y1 2 + x − y2 2 , x − y1 2 − x − y2 2

slide-65
SLIDE 65

Proof

  • x −

y ′, y1 − y ′ =

  • x −

y1 + y2 2 , y1 − y1 + y2 2

  • =
  • x −

y1 2 + x − y2 2 , x − y1 2 − x − y2 2

  • = 1

4

  • ||

x − y1||2 + || x − y2||2 = 0

slide-66
SLIDE 66

Proof

  • x −

y ′, y1 − y ′ =

  • x −

y1 + y2 2 , y1 − y1 + y2 2

  • =
  • x −

y1 2 + x − y2 2 , x − y1 2 − x − y2 2

  • = 1

4

  • ||

x − y1||2 + || x − y2||2 = 0 By Pythagoras’ theorem || x − y1||2

2

slide-67
SLIDE 67

Proof

  • x −

y ′, y1 − y ′ =

  • x −

y1 + y2 2 , y1 − y1 + y2 2

  • =
  • x −

y1 2 + x − y2 2 , x − y1 2 − x − y2 2

  • = 1

4

  • ||

x − y1||2 + || x − y2||2 = 0 By Pythagoras’ theorem || x − y1||2

2 =

  • x −

y ′

  • 2

2 +

  • y1 −

y ′

  • 2

2

slide-68
SLIDE 68

Proof

  • x −

y ′, y1 − y ′ =

  • x −

y1 + y2 2 , y1 − y1 + y2 2

  • =
  • x −

y1 2 + x − y2 2 , x − y1 2 − x − y2 2

  • = 1

4

  • ||

x − y1||2 + || x − y2||2 = 0 By Pythagoras’ theorem || x − y1||2

2 =

  • x −

y ′

  • 2

2 +

  • y1 −

y ′

  • 2

2

=

  • x −

y ′

  • 2

2 +

  • y1 −

y2 2

  • 2

2

slide-69
SLIDE 69

Proof

  • x −

y ′, y1 − y ′ =

  • x −

y1 + y2 2 , y1 − y1 + y2 2

  • =
  • x −

y1 2 + x − y2 2 , x − y1 2 − x − y2 2

  • = 1

4

  • ||

x − y1||2 + || x − y2||2 = 0 By Pythagoras’ theorem || x − y1||2

2 =

  • x −

y ′

  • 2

2 +

  • y1 −

y ′

  • 2

2

=

  • x −

y ′

  • 2

2 +

  • y1 −

y2 2

  • 2

2

>

  • x −

y ′

  • 2

2

slide-70
SLIDE 70

Convex combination

Given n vectors x1, x2, . . . , xn ∈ Rn,

  • x :=

n

  • i=1

θi xi is a convex combination of x1, x2, . . . , xn if θi ≥ 0, 1 ≤ i ≤ n

n

  • i=1

θi = 1

slide-71
SLIDE 71

Convex hull

The convex hull of S is the set of convex combinations of points in S The ℓ1-norm ball is the convex hull of the intersection between the ℓ0 “norm" ball and the ℓ∞-norm ball

slide-72
SLIDE 72

ℓ1-norm ball

slide-73
SLIDE 73

Bℓ1 ⊆ C (Bℓ0 ∩ Bℓ∞)

Let x ∈ Bℓ1 Set θi := | x[i]|, θ0 = 1 − n

i=1 θi

n

i=0 θi = 1 by construction, θi ≥ 0 and

θ0 = 1 −

n+1

  • i=1

θi = 1 − || x||1 ≥ 0 because x ∈ Bℓ1

slide-74
SLIDE 74

Bℓ1 ⊆ C (Bℓ0 ∩ Bℓ∞)

Let x ∈ Bℓ1 Set θi := | x[i]|, θ0 = 1 − n

i=1 θi

n

i=0 θi = 1 by construction, θi ≥ 0 and

θ0 = 1 −

n+1

  • i=1

θi = 1 − || x||1 ≥ 0 because x ∈ Bℓ1

  • x ∈ Bℓ0 ∩ Bℓ∞ because
  • x =

n

  • i=1

θi sign ( x[i]) ei + θ0

slide-75
SLIDE 75

C (Bℓ0 ∩ Bℓ∞) ⊆ Bℓ1

Let x ∈ C (Bℓ0 ∩ Bℓ∞), then

  • x =

m

  • i=1

θi yi

slide-76
SLIDE 76

C (Bℓ0 ∩ Bℓ∞) ⊆ Bℓ1

Let x ∈ C (Bℓ0 ∩ Bℓ∞), then

  • x =

m

  • i=1

θi yi || x||1

slide-77
SLIDE 77

C (Bℓ0 ∩ Bℓ∞) ⊆ Bℓ1

Let x ∈ C (Bℓ0 ∩ Bℓ∞), then

  • x =

m

  • i=1

θi yi || x||1 ≤

m

  • i=1

θi || yi||1 by the Triangle inequality

slide-78
SLIDE 78

C (Bℓ0 ∩ Bℓ∞) ⊆ Bℓ1

Let x ∈ C (Bℓ0 ∩ Bℓ∞), then

  • x =

m

  • i=1

θi yi || x||1 ≤

m

  • i=1

θi || yi||1 by the Triangle inequality ≤

m

  • i=1

θi || yi||∞

  • yi only has one nonzero entry
slide-79
SLIDE 79

C (Bℓ0 ∩ Bℓ∞) ⊆ Bℓ1

Let x ∈ C (Bℓ0 ∩ Bℓ∞), then

  • x =

m

  • i=1

θi yi || x||1 ≤

m

  • i=1

θi || yi||1 by the Triangle inequality ≤

m

  • i=1

θi || yi||∞

  • yi only has one nonzero entry

m

  • i=1

θi

slide-80
SLIDE 80

C (Bℓ0 ∩ Bℓ∞) ⊆ Bℓ1

Let x ∈ C (Bℓ0 ∩ Bℓ∞), then

  • x =

m

  • i=1

θi yi || x||1 ≤

m

  • i=1

θi || yi||1 by the Triangle inequality ≤

m

  • i=1

θi || yi||∞

  • yi only has one nonzero entry

m

  • i=1

θi ≤ 1

slide-81
SLIDE 81

Convex optimization problem

f0, f1, . . . , fm, h1, . . . , hp : Rn → R minimize f0 ( x) subject to fi ( x) ≤ 0, 1 ≤ i ≤ m, hi ( x) = 0, 1 ≤ i ≤ p,

slide-82
SLIDE 82

Definitions

◮ A feasible vector is a vector that satisfies all the constraints ◮ A solution is any vector

x ∗ such that for all feasible vectors x f0 ( x) ≥ f0 ( x ∗)

◮ If a solution exists f (

x ∗) is the optimal value or optimum of the problem

slide-83
SLIDE 83

Convex optimization problem

The optimization problem is convex if

◮ f0 is convex ◮ f1, . . . , fm are convex ◮ h1, . . . , hp are affine, i.e. hi (

x) = a T

i

x + bi for some ai ∈ Rn and bi ∈ R

slide-84
SLIDE 84

Linear program

minimize

  • aT

x subject to

  • c T

i

x ≤ di, 1 ≤ i ≤ m A x = b

slide-85
SLIDE 85

ℓ1-norm minimization as an LP

The optimization problem minimize || x||1 subject to A x = b can be recast as the LP minimize

m

  • i=1
  • t[i]

subject to

  • t[i] ≥

ei

T

x

  • t[i] ≥ −

ei

T

x A x = b

slide-86
SLIDE 86

Proof

Solution to ℓ1-norm min. problem: x ℓ1 Solution to linear program:

  • x lp,

t lp Set t ℓ1[i] :=

  • x ℓ1[i]
  • x ℓ1,

t ℓ1 is feasible for linear program

  • x ℓ1
  • 1 =

m

  • i=1
  • t ℓ1[i]
slide-87
SLIDE 87

Proof

Solution to ℓ1-norm min. problem: x ℓ1 Solution to linear program:

  • x lp,

t lp Set t ℓ1[i] :=

  • x ℓ1[i]
  • x ℓ1,

t ℓ1 is feasible for linear program

  • x ℓ1
  • 1 =

m

  • i=1
  • t ℓ1[i]

m

  • i=1
  • t lp[i]

by optimality of t lp

slide-88
SLIDE 88

Proof

Solution to ℓ1-norm min. problem: x ℓ1 Solution to linear program:

  • x lp,

t lp Set t ℓ1[i] :=

  • x ℓ1[i]
  • x ℓ1,

t ℓ1 is feasible for linear program

  • x ℓ1
  • 1 =

m

  • i=1
  • t ℓ1[i]

m

  • i=1
  • t lp[i]

by optimality of t lp ≥

  • x lp
  • 1
slide-89
SLIDE 89

Proof

Solution to ℓ1-norm min. problem: x ℓ1 Solution to linear program:

  • x lp,

t lp Set t ℓ1[i] :=

  • x ℓ1[i]
  • x ℓ1,

t ℓ1 is feasible for linear program

  • x ℓ1
  • 1 =

m

  • i=1
  • t ℓ1[i]

m

  • i=1
  • t lp[i]

by optimality of t lp ≥

  • x lp
  • 1
  • x lp is a solution to the ℓ1-norm min. problem
slide-90
SLIDE 90

Proof

Set t ℓ1[i] :=

  • x ℓ1[i]
  • m
  • i=1

tℓ1

i

=

  • x ℓ1
  • 1
slide-91
SLIDE 91

Proof

Set t ℓ1[i] :=

  • x ℓ1[i]
  • m
  • i=1

tℓ1

i

=

  • x ℓ1
  • 1

  • x lp
  • 1

by optimality of x ℓ1

slide-92
SLIDE 92

Proof

Set t ℓ1[i] :=

  • x ℓ1[i]
  • m
  • i=1

tℓ1

i

=

  • x ℓ1
  • 1

  • x lp
  • 1

by optimality of x ℓ1 ≤

m

  • i=1
  • t lp[i]
slide-93
SLIDE 93

Proof

Set t ℓ1[i] :=

  • x ℓ1[i]
  • m
  • i=1

tℓ1

i

=

  • x ℓ1
  • 1

  • x lp
  • 1

by optimality of x ℓ1 ≤

m

  • i=1
  • t lp[i]
  • x ℓ1,

t ℓ1 is a solution to the linear problem

slide-94
SLIDE 94

Quadratic program

For a positive semidefinite matrix Q ∈ Rn×n minimize

  • xTQ

x + aT x subject to

  • c T

i

x ≤ di, 1 ≤ i ≤ m, A x = b

slide-95
SLIDE 95

ℓ1-norm regularized least squares as a QP

The optimization problem minimize ||A x − y||2

2 +

α || x||1 can be recast as the QP minimize

  • xTATA

x − 2 yT x + α

n

  • i=1
  • t[i]

subject to

  • t[i] ≥

ei

T

x

  • t[i] ≥ −

ei

T

x

slide-96
SLIDE 96

Lagrangian

The Lagrangian of a canonical optimization problem is L ( x, α, ν) := f0 ( x) +

m

  • i=1
  • α[i] fi (

x) +

p

  • j=1
  • ν[j] hj (

x) ,

  • α ∈ Rm,

ν ∈ Rp are called Lagrange multipliers or dual variables If x is feasible and α[i] ≥ 0 for 1 ≤ i ≤ m L ( x, α, ν) ≤ f0 ( x)

slide-97
SLIDE 97

Lagrange dual function

The Lagrange dual function of the problem is l ( α, ν) := inf

  • x∈Rn f0 (

x) +

m

  • i=1
  • α[i]fi (

x) +

p

  • j=1
  • ν[j]hj (

x) Let p∗ be an optimum of the optimization problem l ( α, ν) ≤ p∗ as long as α[i] ≥ 0 for 1 ≤ i ≤ n

slide-98
SLIDE 98

Dual problem

The dual problem of the (primal) optimization problem is maximize l ( α, ν) subject to

  • α[i] ≥ 0,

1 ≤ i ≤ m. The dual problem is always convex, even if the primal isn’t!

slide-99
SLIDE 99

Maximum/supremum of convex functions

Pointwise maximum of m convex functions f1, . . . , fm fmax (x) := max

1≤i≤m fi (x)

is convex Pointwise supremum of a family of convex functions indexed by a set I fsup (x) := sup

i∈I

fi (x) is convex

slide-100
SLIDE 100

Proof

For any 0 ≤ θ ≤ 1 and any x, y ∈ R, fsup (θ x + (1 − θ) y) = sup

i∈I

fi (θ x + (1 − θ) y)

slide-101
SLIDE 101

Proof

For any 0 ≤ θ ≤ 1 and any x, y ∈ R, fsup (θ x + (1 − θ) y) = sup

i∈I

fi (θ x + (1 − θ) y) ≤ sup

i∈I

θfi ( x) + (1 − θ) fi ( y) by convexity of the fi

slide-102
SLIDE 102

Proof

For any 0 ≤ θ ≤ 1 and any x, y ∈ R, fsup (θ x + (1 − θ) y) = sup

i∈I

fi (θ x + (1 − θ) y) ≤ sup

i∈I

θfi ( x) + (1 − θ) fi ( y) by convexity of the fi ≤ θ sup

i∈I

fi ( x) + (1 − θ) sup

j∈I

fj ( y)

slide-103
SLIDE 103

Proof

For any 0 ≤ θ ≤ 1 and any x, y ∈ R, fsup (θ x + (1 − θ) y) = sup

i∈I

fi (θ x + (1 − θ) y) ≤ sup

i∈I

θfi ( x) + (1 − θ) fi ( y) by convexity of the fi ≤ θ sup

i∈I

fi ( x) + (1 − θ) sup

j∈I

fj ( y) = θfsup ( x) + (1 − θ) fsup ( y)

slide-104
SLIDE 104

Weak duality

If p∗ is a primal optimum and d∗ a dual optimum d∗ ≤ p∗

slide-105
SLIDE 105

Strong duality

For convex problems d∗ = p∗ under very weak conditions LPs: The primal optimum is finite General convex programs (Slater’s condition): There exists a point that is strictly feasible fi ( x) < 0 1 ≤ i ≤ m

slide-106
SLIDE 106

ℓ1-norm minimization

The dual problem of min

  • x ||

x||1 subject to A x = y is max

  • ν
  • yT

ν subject to

  • AT

ν

  • ∞ ≤ 1
slide-107
SLIDE 107

Proof

Lagrangian L ( x, ν) = || x||1 + νT ( y − A x) Lagrange dual function l ( α, ν) := inf

  • x∈Rn ||

x||1 − (AT ν)T x + νT y

slide-108
SLIDE 108

Proof

Lagrangian L ( x, ν) = || x||1 + νT ( y − A x) Lagrange dual function l ( α, ν) := inf

  • x∈Rn ||

x||1 − (AT ν)T x + νT y If AT ν[i] > 1?

slide-109
SLIDE 109

Proof

Lagrangian L ( x, ν) = || x||1 + νT ( y − A x) Lagrange dual function l ( α, ν) := inf

  • x∈Rn ||

x||1 − (AT ν)T x + νT y If AT ν[i] > 1? We can set x[i] → ∞ and l ( α, ν) → −∞

slide-110
SLIDE 110

Proof

Lagrangian L ( x, ν) = || x||1 + νT ( y − A x) Lagrange dual function l ( α, ν) := inf

  • x∈Rn ||

x||1 − (AT ν)T x + νT y If AT ν[i] > 1? We can set x[i] → ∞ and l ( α, ν) → −∞ If

  • AT

ν

  • ∞ ≤ 1?

(AT ν)T x

slide-111
SLIDE 111

Proof

Lagrangian L ( x, ν) = || x||1 + νT ( y − A x) Lagrange dual function l ( α, ν) := inf

  • x∈Rn ||

x||1 − (AT ν)T x + νT y If AT ν[i] > 1? We can set x[i] → ∞ and l ( α, ν) → −∞ If

  • AT

ν

  • ∞ ≤ 1?

(AT ν)T x ≤ || x||1

  • AT

ν

  • ∞ ≤ ||

x||1

slide-112
SLIDE 112

Proof

Lagrangian L ( x, ν) = || x||1 + νT ( y − A x) Lagrange dual function l ( α, ν) := inf

  • x∈Rn ||

x||1 − (AT ν)T x + νT y If AT ν[i] > 1? We can set x[i] → ∞ and l ( α, ν) → −∞ If

  • AT

ν

  • ∞ ≤ 1?

(AT ν)T x ≤ || x||1

  • AT

ν

  • ∞ ≤ ||

x||1 so l ( α, ν) = νT y

slide-113
SLIDE 113

Strong duality

The solution ν ∗ to max

  • ν
  • yT

ν subject to

  • AT

ν

  • ∞ ≤ 1

satisfies (AT ν ∗)[i]= sign( x ∗[i]) for all x ∗[i] = 0 for all solutions x ∗ to the primal problem min

  • x ||

x||1 subject to A x = y

slide-114
SLIDE 114

Dual solution

10 20 30 40 50 60 70

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1 True Signal Sign Minimum L1 Dual Solution

slide-115
SLIDE 115

Proof

By strong duality || x ∗||1 = yT ν ∗ = (A x ∗)T ν ∗ = ( x ∗)T(AT ν ∗) =

m

  • i=1

(AT ν ∗)[i] x ∗[i] By Hölder’s inequality || x ∗||1 ≥

m

  • i=1

(AT ν ∗)[i] x ∗[i] with equality if and only if (AT ν ∗)[i] = sign( x ∗[i]) for all x ∗[i] = 0

slide-116
SLIDE 116

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and AT w for any w

  • yT

w

slide-117
SLIDE 117

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and AT w for any w

  • yT

w = (A x)T w

slide-118
SLIDE 118

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and AT w for any w

  • yT

w = (A x)T w = xT(AT w)

slide-119
SLIDE 119

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and AT w for any w

  • yT

w = (A x)T w = xT(AT w) Idea: Maximize AT w, bounding magnitude of entries by 1

slide-120
SLIDE 120

Another algorithm for sparse recovery

Aim: Find nonzero locations of a sparse vector x from y = A x Insight: We have access to inner products of x and AT w for any w

  • yT

w = (A x)T w = xT(AT w) Idea: Maximize AT w, bounding magnitude of entries by 1 Entries where x is nonzero should saturate to 1 or -1

slide-121
SLIDE 121

Compressed sensing Convex constrained problems Analyzing optimization-based methods

slide-122
SLIDE 122

Analyzing optimization-based methods

Best case scenario: Primal solution has closed form Otherwise: Use dual solution to characterize primal solution

slide-123
SLIDE 123

Minimum ℓ2-norm solution

Let A ∈ Rm×n be a full rank matrix such that m < n For any y ∈ Rn the solution to the optimization problem arg min

  • x ||

x||2 subject to A x = y. is

  • x ∗ := VS−1UT

y = AT ATA −1

  • y

where A = USV T is the SVD of A

slide-124
SLIDE 124

Proof

  • x = Prow(A)

x + Prow(A)⊥ x Since A is full rank V , Prow(A) x = V c for some vector c ∈ Rn A x = AProw(A) x

slide-125
SLIDE 125

Proof

  • x = Prow(A)

x + Prow(A)⊥ x Since A is full rank V , Prow(A) x = V c for some vector c ∈ Rn A x = AProw(A) x = USV TV c

slide-126
SLIDE 126

Proof

  • x = Prow(A)

x + Prow(A)⊥ x Since A is full rank V , Prow(A) x = V c for some vector c ∈ Rn A x = AProw(A) x = USV TV c = US c

slide-127
SLIDE 127

Proof

  • x = Prow(A)

x + Prow(A)⊥ x Since A is full rank V , Prow(A) x = V c for some vector c ∈ Rn A x = AProw(A) x = USV TV c = US c A x = y is equivalent to US c = y and c = S−1UT y

slide-128
SLIDE 128

Proof

For all feasible vectors x Prow(A) x = VS−1UT y By Pythagoras’ theorem, minimizing || x||2 is equivalent to minimizing || x||2

2 =

  • Prow(A)

x

  • 2

2 +

  • Prow(A)⊥

x

  • 2

2

slide-129
SLIDE 129

Regular subsampling

slide-130
SLIDE 130

Minimum ℓ2-norm solution (regular subsampling)

10 20 30 40 50 60 70

  • 3
  • 2
  • 1

1 2 3 4 True Signal Minimum L2 Solution

slide-131
SLIDE 131

Regular subsampling

A := 1 √ 2

  • Fm/2

Fm/2

  • F ∗

m/2Fm/2 = I

Fm/2F ∗

m/2 = I

  • x :=

xup

  • xdown
slide-132
SLIDE 132

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2

slide-133
SLIDE 133

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2 = AT ATA −1 y

slide-134
SLIDE 134

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2 = AT ATA −1 y = 1 √ 2

  • F ∗

m/2

F ∗

m/2

1 √ 2

  • Fm/2

Fm/2 1 √ 2

  • F ∗

m/2

F ∗

m/2

−1 1 √ 2

  • Fm/2

Fm/2 xup

  • xdown
slide-135
SLIDE 135

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2 = AT ATA −1 y = 1 √ 2

  • F ∗

m/2

F ∗

m/2

1 √ 2

  • Fm/2

Fm/2 1 √ 2

  • F ∗

m/2

F ∗

m/2

−1 1 √ 2

  • Fm/2

Fm/2 xup

  • xdown
  • = 1

2

  • F ∗

m/2

F ∗

m/2

1 2

  • Fm/2F ∗

m/2 + Fm/2F ∗ m/2

−1 Fm/2 xup + Fm/2 xdown

slide-136
SLIDE 136

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2 = AT ATA −1 y = 1 √ 2

  • F ∗

m/2

F ∗

m/2

1 √ 2

  • Fm/2

Fm/2 1 √ 2

  • F ∗

m/2

F ∗

m/2

−1 1 √ 2

  • Fm/2

Fm/2 xup

  • xdown
  • = 1

2

  • F ∗

m/2

F ∗

m/2

1 2

  • Fm/2F ∗

m/2 + Fm/2F ∗ m/2

−1 Fm/2 xup + Fm/2 xdown

  • = 1

2

  • F ∗

m/2

F ∗

m/2

  • I −1

Fm/2 xup + Fm/2 xdown

slide-137
SLIDE 137

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2 = AT ATA −1 y = 1 √ 2

  • F ∗

m/2

F ∗

m/2

1 √ 2

  • Fm/2

Fm/2 1 √ 2

  • F ∗

m/2

F ∗

m/2

−1 1 √ 2

  • Fm/2

Fm/2 xup

  • xdown
  • = 1

2

  • F ∗

m/2

F ∗

m/2

1 2

  • Fm/2F ∗

m/2 + Fm/2F ∗ m/2

−1 Fm/2 xup + Fm/2 xdown

  • = 1

2

  • F ∗

m/2

F ∗

m/2

  • I −1

Fm/2 xup + Fm/2 xdown

  • = 1

2

  • F ∗

m/2

  • Fm/2

xup + Fm/2 xdown

  • F ∗

m/2

  • Fm/2

xup + Fm/2 xdown

slide-138
SLIDE 138

Regular subsampling

  • xℓ2 = arg min

A x= y ||

x||2 = AT ATA −1 y = 1 √ 2

  • F ∗

m/2

F ∗

m/2

1 √ 2

  • Fm/2

Fm/2 1 √ 2

  • F ∗

m/2

F ∗

m/2

−1 1 √ 2

  • Fm/2

Fm/2 xup

  • xdown
  • = 1

2

  • F ∗

m/2

F ∗

m/2

1 2

  • Fm/2F ∗

m/2 + Fm/2F ∗ m/2

−1 Fm/2 xup + Fm/2 xdown

  • = 1

2

  • F ∗

m/2

F ∗

m/2

  • I −1

Fm/2 xup + Fm/2 xdown

  • = 1

2

  • F ∗

m/2

  • Fm/2

xup + Fm/2 xdown

  • F ∗

m/2

  • Fm/2

xup + Fm/2 xdown

  • = 1

2

  • xup +

xdown

  • xup +

xdown

slide-139
SLIDE 139

Minimum ℓ1-norm solution

Problem: arg minA

x= y ||

x||1 doesn’t have a closed form Instead we can use a dual variable to certify optimality

slide-140
SLIDE 140

Dual solution

The solution ν ∗ to max

  • ν
  • yT

ν subject to

  • AT

ν

  • ∞ ≤ 1

satisfies (AT ν ∗)[i]= sign( x ∗[i]) for all x ∗[i] = 0 where x ∗[i] is a solution to the primal problem min

  • x ||

x||1 subject to A x = y

slide-141
SLIDE 141

Dual certificate

If there exists a vector ν ∈ Rn such that (AT ν)[i] = sign( x ∗[i]) if x ∗[i] = 0

  • (AT

ν)[i]

  • <1

if x ∗[i] = 0 then x ∗ is the unique solution to the primal problem min

  • x ||

x||1 subject to A x = y as long as the submatrix AT is full rank

slide-142
SLIDE 142

Proof 1

  • ν is feasible for the dual problem, so for any primal feasible

x || x||1 ≥ yT ν

slide-143
SLIDE 143

Proof 1

  • ν is feasible for the dual problem, so for any primal feasible

x || x||1 ≥ yT ν = (A x ∗)T ν

slide-144
SLIDE 144

Proof 1

  • ν is feasible for the dual problem, so for any primal feasible

x || x||1 ≥ yT ν = (A x ∗)T ν = ( x ∗)T(AT ν)

slide-145
SLIDE 145

Proof 1

  • ν is feasible for the dual problem, so for any primal feasible

x || x||1 ≥ yT ν = (A x ∗)T ν = ( x ∗)T(AT ν) =

  • i∈T
  • x ∗[i] sign(

x ∗[i])

slide-146
SLIDE 146

Proof 1

  • ν is feasible for the dual problem, so for any primal feasible

x || x||1 ≥ yT ν = (A x ∗)T ν = ( x ∗)T(AT ν) =

  • i∈T
  • x ∗[i] sign(

x ∗[i]) = || x ∗||1

  • x ∗ must be a solution
slide-147
SLIDE 147

Proof 2

AT ν is a subgradient of the ℓ1 norm at x ∗ For any other feasible vector x || x||1 ≥ || x ∗||1 + (AT ν)T ( x − x ∗)

slide-148
SLIDE 148

Proof 2

AT ν is a subgradient of the ℓ1 norm at x ∗ For any other feasible vector x || x||1 ≥ || x ∗||1 + (AT ν)T ( x − x ∗) = || x ∗||1 + νT (A x − A x ∗)

slide-149
SLIDE 149

Proof 2

AT ν is a subgradient of the ℓ1 norm at x ∗ For any other feasible vector x || x||1 ≥ || x ∗||1 + (AT ν)T ( x − x ∗) = || x ∗||1 + νT (A x − A x ∗) = || x ∗||1

slide-150
SLIDE 150

Random subsampling

slide-151
SLIDE 151

Minimum ℓ1-norm solution (random subsampling)

10 20 30 40 50 60 70

  • 3
  • 2
  • 1

1 2 3 4 True Signal Minimum L1 Solution

slide-152
SLIDE 152

Exact sparse recovery via ℓ1-norm minimization

Assumption: There exists a signal x ∗ ∈ Rm with s nonzeros such that A x ∗ = y for a random A ∈ Rm×n (random Fourier, Gaussian iid, . . . ) Exact recovery: If the number of measurements satisfies m ≥ C ′s log n the solution of the problem minimize || x||1 subject to A x = y is the original signal with probability at least 1 − 1

n

slide-153
SLIDE 153

Proof

Show that dual certificate always exists We need AT

T

ν = sign( x ∗

T)

s constraints

  • AT

T c

ν

  • ∞ < 1

Idea: Impose AT ν = sign( x ∗) and minimize

  • AT

T c

ν

Problem: No closed-form solution How about minimizing ℓ2 norm?

slide-154
SLIDE 154

Proof of exact recovery

Prove that dual certificate exists for any s-sparse x ∗ Dual certificate candidate: Solution of minimize || v||2 subject to AT

T

v = sign ( x ∗

T)

Closed-form solution νℓ2 := AT

  • AT

TAT

−1 sign ( x ∗

T)

AT

TAT is invertible with high probability

We need to prove that AT νℓ2 satisfies

  • (AT

νℓ2)T c

  • ∞ < 1
slide-155
SLIDE 155

Dual certificate

10 20 30 40 50 60 70

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Sign Pattern Dual Function

slide-156
SLIDE 156

Proof of exact recovery

To control (AT νℓ2)T c, we need to bound AT

i

  • AT

TAT

−1 sign ( x ∗

T)

for i ∈ T c Let w :=

  • AT

TAT

−1 sign ( x ∗

T)

|AT

i

w| can be bounded using independence Result then follows from union bound