The Algorithmic Frontiers of Atomic Norm Minimization: Relaxation, - - PowerPoint PPT Presentation

the algorithmic frontiers of atomic norm minimization
SMART_READER_LITE
LIVE PREVIEW

The Algorithmic Frontiers of Atomic Norm Minimization: Relaxation, - - PowerPoint PPT Presentation

The Algorithmic Frontiers of Atomic Norm Minimization: Relaxation, Discretization, and Randomization Benjamin Recht University of California, Berkeley Linear Inverse Problems Find me a solution of y = x n x p, n<p


slide-1
SLIDE 1

The Algorithmic Frontiers of Atomic Norm Minimization: Relaxation, Discretization, and Randomization

Benjamin Recht University of California, Berkeley

slide-2
SLIDE 2

Linear Inverse Problems

  • Find me a solution of
  • Φ n x p, n<p
  • Of the infinite collection of solutions, which one

should we pick?

  • Leverage structure:
  • How do we design algorithms to solve

underdetermined systems problems with priors?

y = Φx

Sparsity Rank Smoothness Symmetry

slide-3
SLIDE 3
  • Search for best linear combination of fewest atoms
  • “rank” = fewest atoms needed to describe the model

Atomic Decompositions

atoms model weights rank

slide-4
SLIDE 4

Atomic Norms

  • Given a basic set of atoms, , define the function
  • Under mild conditions, we get a norm
  • When does this work?
  • How do we solve the optimization problem?

kxkA = inf{ X

a∈A

|ca| : x = X

a∈A

caa} kxkA = inf{t > 0 : x 2 tconv(A)} minimize kzkA subject to Φz = y

IDEA:

A

slide-5
SLIDE 5

Atomic Norm Minimization

  • Generalizes existing, powerful methods
  • Rigorous formula for developing new analysis

algorithms

  • Precise, tight bounds on number of measurements

needed for model recovery

  • One algorithm prototype for a myriad of data-

analysis applications

minimize kzkA subject to Φz = y

IDEA:

Chandrasekaran, R, Parrilo, and Willsky

slide-6
SLIDE 6

Union of Subspaces

  • X has structured sparsity: linear combination of elements

from a set of subspaces {Ug}.

  • Atomic set: unit norm vectors living in one of the Ug

Permutations and Rankings

  • X a sum of a few permutation matrices
  • Examples: Multiobject Tracking, Ranked elections, BCS
  • Convex hull of permutation matrices: doubly stochastic matrices.
slide-7
SLIDE 7
  • Moments: convex hull of of [1,t,t2,t3,t4,...],

t∈T, some basic set.

  • System Identification, Image Processing,

Numerical Integration, Statistical Inference

  • Solve with semidefinite programming
  • Cut-matrices: sums of rank-one sign matrices.
  • Collaborative Filtering, Clustering in Genetic

Networks, Combinatorial Approximation Algorithms

  • Approximate with semidefinite

programming

  • Low-rank Tensors: sums of rank-one tensors
  • Computer Vision, Image Processing,

Hyperspectral Imaging, Neuroscience

  • Approximate with alternating least-

squares

slide-8
SLIDE 8

Algorithms

  • Naturally amenable to projected gradient algorithm:
  • Similar algorithm for atomic norm constraint
  • Same basic ingredients for ALM, ADM, Bregman,

Mirror Prox, etc... how to compute the shrinkage?

zk+1 = Πηµ(zk − ηΦ∗rk) minimizez kΦz yk2

2 + µkzkA

rk = Φzk − y

“shrinkage” residual

Πτ(z) = arg min

u 1 2kz uk2 + τkukA

slide-9
SLIDE 9

Shrinkage

  • Dual norm

Λτ(z) = arg min

kvk∗

Aτ

1 2kz vk2

z = Πτ(z) + Λτ(z) Πτ(z) = arg min

u 1 2kz uk2 + τkukA

kvk∗

A = max a∈Ahv, ai

slide-10
SLIDE 10

Relaxations

  • Dual norm is efficiently computable if the set of

atoms is polyhedral or semidefinite representable

  • Convex relaxations of atoms yield approximations to

the norm

  • Hierarchy of relaxations based on θ-Bodies yield

progressively tighter bounds on the atomic norm

A1 ⇢ A2 = ) kxk∗

A1  kxk∗ A2 and kxkA2  kxkA1

kvk∗

A = max a∈Ahv, ai

NB! sample complexity increases

slide-11
SLIDE 11

kvk∗

A = max a∈Ahv, ai  τ (

) hv, ai = τ q(a)

  • Suppose is an algebraic variety
  • Relaxation: restrict h to be sum of squares.
  • Gives a lower bound on atomic norm
  • Solvable by semidefinite programming (Gouveia,

Parrilo, and Thomas, 2010)

Theta Bodies

g ∈ I q = h + g A = {x : f(x) = 0 ∀ f ∈ I} h(x) ≥ 0 ∀x A

positive everywhere vanishes on atoms

slide-12
SLIDE 12
  • Relaxations:
  • Let be a finite net.
  • Let be a matrix whose columns are the set
  • Often times, we can compute explicit bounds such

that

Approximation through discretization

A1 ⇢ A2 = ) kxkA2  kxkA1 Ψ A✏ kxkA✏ = inf (X

k

|ck| : x = Ψc ) λ✏kxkA✏  kxkA  kxkA✏

+ extra equality constraints

`1

slide-13
SLIDE 13

Discretization Theory

  • Discretize the parameter space to get a

finite number of grid points

  • Enforce finite number of constraints:
  • Equivalently, in the primal replace the

atomic norm with a discrete one

  • What happens to the solutions when

| hΦ∗z, a(ωj)i |  1, ωj 2 Ωm

slide-14
SLIDE 14

Convergence in Dual

  • Assumption: there exist parameters

such that are linearly independent

  • Enforce finite constraints in the dual:

Theorem:

  • The discretized optimal objectives

converge to the original objective

  • Any solution sequence of the

discretized problems has a subsequence that converges to the solution set of the original problem

  • For the LASSO dual, the convergence

speed is

{ˆ zm} O(ρm)

0.5 1 0.5 1 1.5 m = 32 0.5 1 0.5 1 1.5 m = 128 0.5 1 0.5 1 1.5 m = 512 4 6 8 10 12 30 20 10 log2(|fm − f ∗|) 4 6 8 10 12 15 10 5 log2(m) log2(∥ˆ zm − ˆ z∥)

| hΦ∗z, a(ωj)i |  1, ωj 2 Ωm

slide-15
SLIDE 15

Single Molecule Imaging

Courtesy of Zhuang Research Lab

slide-16
SLIDE 16

Single Molecule Imaging

  • Bundles of 8 tubes of 30 nm diameter
  • Sparse density: 81049 molecules on

12000 frames

  • Resolution: 64x64 pixels
  • Pixel size: 100nmx100nm
  • Field of view: 6400nmx6400nm
  • Target resolution: 10nmx10nm
  • Discretize the FOV into 640x640

pixels

I(x, y) = X

j

cjPSF(x − xj, y − yj) (xj, yj) ∈ [0, 6400]2 (x, y) ∈ {50, 150, . . . , 6350}

slide-17
SLIDE 17

Single Molecule Imaging

slide-18
SLIDE 18

Single Molecule Imaging

10 20 30 40 50 0.2 0.4 0.6 0.8 1 Radius Precision Sparse CoG quickPALM 10 20 30 40 50 0.2 0.4 0.6 0.8 1 Radius Recall Sparse CoG quickPALM 10 20 30 40 50 20 40 60 80 100 Radius Jaccard Sparse CoG quickPALM 10 20 30 40 50 0.2 0.4 0.6 0.8 1 Radius Fscore Sparse CoG quickPALM

slide-19
SLIDE 19

Atomic norms in sparse approximation

  • Greedy approximations
  • Best n term approximation to a function f in the

convex hull of A.

  • Maurey, Jones, and Barron (1980s-90s)
  • Devore and Temlyakov (1996)

kf fnkL2  c0kfkA pn

slide-20
SLIDE 20

If greedy is hard…

  • Training these networks is hard
  • But for fixed θk, the following can be feasible
  • Can we just not optimize the θk?
  • What if we randomly sample the parameters?
slide-21
SLIDE 21
  • Fix parameterized basis functions
  • Fix a probability distribution
  • Our target space will be:
  • Example: Fourier basis functions:
  • Gaussian parameters
  • If

, then means that the frequency distribution of f has subgaussian tails.

slide-22
SLIDE 22
  • Theorem 1: Let f be in with

Let ω1,…, ωn be sampled iid from p. Then

  • with probability at least 1 - δ.
  • Generalization Error
  • It’s a finite sized basis set!
  • Choosing gives overall convergence of

Estimation Error Approximation Error

slide-23
SLIDE 23

% Approximates Gaussian Process regression % with Gaussian kernel of variance gamma % lambda: regularization parameter % dataset: X is dxN, y is 1xN % test: xtest is dx1 % D: dimensionality of random feature % training w = randn(D, size(X,1)); b = 2*pi*rand(D,1); Z = cos(sqrt(gamma)*w*X + repmat(b,1,size(X,2))); % Equivalent to % alpha = (lambda*eye(size(X,2))+Z*Z’)\(Z*y); alpha = symmlq(@(v)(lambda*v(:) + Z*(Z'*v(:))),… Z*y(:),1e-6,2000);

  • % testing

ztest = alpha(:)’*cos( sqrt(gamma)*w*xtest(:) + … + repmat(b,1,size(X,2)) );

slide-24
SLIDE 24
  • Relaxation - hierarchies of approximating complex

priors via semidefinite programming

  • Discretization - fast convergence in distribution for

models that admit tight discretizations

  • Randomization - efficient algorithms for greedy

methods with practical algorithms

  • Challenge - integrate these ideas into fast, greedy

algorithms.