Latent Structure Beyond Sparse Codes Benjamin Recht Department of - PowerPoint PPT Presentation

Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and Statistics University of California, Berkeley

2.5x Gabor-like thingies... redundancy robustness and sparsity Sparse Codes Which mathematical representations can be learned robustly?

Sparse Approximation Lasso • n patients << n peaks • If very few are needed for diagnosis, search for a sparse set of markers Compressed Sensing • Use the fact that images are sparse in wavelet basis to reduce number of measurements required for signal acquisition.

Cardinality Minimization • PROBLEM: Find the vector of lowest cardinality that satisfies/approximates the underdetermined linear system Φ : R p → R n Φ x = y • NP-HARD: – Reduce to EXACT-COVER – Hard to approximate – Known exact algorithms require enumeration • HEURISTIC: Replace cardinality with l 1 norm

Recommender Geometric Systems Structure Data Gram Rank of: Rank of: Matrix Matrix Quantum Seismic Tomography Imaging Density Unfolded Rank of: Rank of: Matrix Tensor

Affine Rank Minimization • PROBLEM: Find the matrix of lowest rank that satisfies/approximates the underdetermined linear system Φ : R p 1 × p 2 → R n Φ ( X ) = y • NP-HARD: – Reduce to solving polynomial equations – Hard to approximate – Exact algorithms are awful • HUERISTIC: Replace rank with nuclear norm

Heuristic: Gradient Descent R * IDEA: Replace rank with nuclear norm : = L M minimize k X k ∗ subject to Φ ( X ) = b p 1 x p 2 p 1 x r r x p 2 Step 1: Pick (i,j) and compute residual: • e = ( L i R T j − M ij ) Step 2: Take a mixture of current model and • corrected model ( 𝛽, β >0):  α L i − β e R j  L i � � ← α R j − β e L i R j Some guy on livejournal, 2006 Succeeds when number of Fazel, Parillo, Recht, 2007 samples is Õ(r(p 1 +p 2 )) Candes and Recht, 2008

Observe a time series driven by the input y 1 , y 2 , . . . , y T u 1 , u 2 , . . . u T Na et al, 2012 What is a principled way to build a parsimonious model for the input-output responses? System Identification: find a dynamical model that agrees with time series data • All linear systems are combinations of single pole filters. • Leverage this structure for new algorithms and analysis. Shah, Bhaskar, Tang, and Recht 2012

Linear Inverse Problems Find me a solution of • y = Φ x Φ n x p, n<p • Of the infinite collection of solutions, which one • should we pick? Leverage structure: • Sparsity Rank Smoothness Symmetry How do we design algorithms to solve • underdetermined systems problems with priors?

Sparsity 1 • 1-sparse vectors of Euclidean norm 1 -1 1 • Convex hull is the unit ball of the l 1 norm -1 p X k x k 1 = | x i | i =1

x 2 Φ x=y k x k 1 minimize subject to Φ x = y x 1 Compressed Sensing: Candes, Romberg, Tao, Donoho, Tanner, Etc...

Rank • 2x2 matrices • plotted in 3d rank 1 x 2 + z 2 + 2y 2 = 1

Rank • 2x2 matrices • plotted in 3d rank 1 x 2 + z 2 + 2y 2 = 1 Convex hull: X k X k ∗ = σ i ( X ) i

• 2x2 matrices • plotted in 3d X k X k ∗ = σ i ( X ) i Nuclear Norm Heuristic Fazel 2002. R, Fazel, and Parillo 2007 Rank Minimization/Matrix Completion

Integer Programming • Integer solutions: all components of x (-1,1) (1,1) are ±1 • Convex hull is the unit ball of the l 1 norm (-1,-1) (1,-1)

x 2 Φ x=y k x k ∞ minimize subject to Φ x = y x 1 Donoho and Tanner 2008 Mangasarian and Recht. 2009.

Parsimonious Models rank model atoms weights • Search for best linear combination of fewest atoms • “rank” = fewest atoms needed to describe the model

Atomic Norms Given a basic set of atoms, , define the function A • k x k A = inf { t > 0 : x 2 t conv( A ) } When is centrosymmetric, we get a norm A • X X k x k A = inf { | c a | : x = c a a } a ∈ A a ∈ A k z k A minimize IDEA: subject to Φ z = y When can we compute this? • When does this work? •

Union of Subspaces X has structured sparsity: linear combination of elements • from a set of subspaces {U g }. Atomic set: unit norm vectors living in one of the U g • Permutations and Rankings X a sum of a few permutation matrices • Examples: Multiobject Tracking, Ranked elections, BCS • Convex hull of permutation matrices: doubly stochastic matrices. •

Moments: convex hull of of [1,t,t 2 ,t 3 ,t 4 ,...], • t ∈T , some basic set. System Identification, Image Processing, • Numerical Integration, Statistical Inference Solve with semidefinite programming • Cut-matrices: sums of rank-one sign matrices. • Collaborative Filtering, Clustering in Genetic • Networks, Combinatorial Approximation Algorithms Approximate with semidefinite • programming Low-rank Tensors: sums of rank-one tensors • Computer Vision, Image Processing, • Hyperspectral Imaging, Neuroscience Approximate with alternating least- • squares

Atomic norms in sparse approximation Greedy approximations • k f � f n k L 2  c 0 k f k A p n Best n term approximation to a function f in the • convex hull of A . Maurey, Jones, and Barron (1980s-90s) • Devore and Temlyakov (1996) • Random Feature Heuristics (Rahimi and R, 2007) •

Tangent Cones Set of directions that decrease the norm from x • form a cone: T A ( x ) = { d : k x + α d k A  k x k A for some α > 0 } y = Φ z x k z k A minimize subject to Φ z = y T A ( x ) { z : k z k A  k x k A } x is the unique minimizer if the intersection of this • cone with the null space of Φ ¡ equals {0}

Mean Width d 0 x Support Function: S C ( d ) = sup x 2 C S C ( d ) + S C ( − d ) − d 0 x measures width of C when projected onto span of d . d 0 x Z mean width: w ( C ) = S p − 1 S C ( u ) du

When does a random subspace, U in , intersect a R p • convex cone C at the origin? Gordon (1988): with high probability if • codim( U ) ≥ p w ( C ∩ S p − 1 ) 2 Z w ( C ∩ S p − 1 ) = where is the mean width . S p − 1 S C ( u ) du Corollary: For inverse problems, if Φ is a random • Gaussian matrix with n rows, need n ≥ p w ( T A ( x ) ∩ S p − 1 ) 2 for exact recovery of x .

Rates Hypercube: • n ≥ p/ 2 Sparse Vectors, p vector, sparsity s • � p + 5 s � n ≥ 2 s log 4 s Block sparse, M groups (possibly overlapping), • maximum group size B, k active groups √ ⌘ 2 ⇣p 2 log ( M − k ) + + kB n ≥ k B Low-rank matrices: p 1 x p 2 , ( p 1 < p 2 ), rank r • n ≥ 3 r ( p 1 + p 2 − r )

Robust Recovery (deterministic) k w k 2  δ y = Φ x + w Suppose we observe • k z k A minimize k Φ z � y k  δ subject to x k  2 � ˆ k x � ˆ If is an optimal solution, then • x ✏ provided that n ≥ pw ( T A ( x ) ∩ S p − 1 ) 2 (1 − ✏ ) 2 k Φ z � y k  δ { z : k z k A  k x k A }

Robust Recovery (statistical) Suppose we observe y = Φ x + w • k Φ z � y k 2 + µ k z k A minimize p If is an optimal solution, then • ˆ k Φ x � Φ ˆ x k 2  µ k x k A x provided that µ � E w [ k Φ ∗ w k ∗ A ] ˆ x And under an additional “cone condition” k x � ˆ x k 2  η ( x, A , Φ , γ ) µ cone { u : k x + u k A  k x k A + γ k u k } Bhaskar, Tang, and Recht 2011

Denoising Rates (re-derivations) Sparse Vectors, p vector, sparsity s • ✓ σ 2 s log( p ) ◆ 1 x � x ? k 2 p k ˆ 2 = O p Low-rank matrices: p 1 x p 2 , ( p 1 < p 2 ), rank r • ✓ σ 2 r ◆ 1 x � x ? k 2 k ˆ F = O p 1 p 2 p 1

Atomic Norm Minimization Chandrasekaran, Recht, Parrilo, and Willsky 2010 k z k A minimize IDEA: subject to Φ z = y Generalizes existing, powerful methods • Rigorous formula for developing new analysis • algorithms Tightest bounds on number of measurements • needed for model recovery in all common models One algorithm prototype for many data-mining • applications

Learning representations x z | � Φ x, Φ z � | � | � x, z � | ASSUME: Gram matrix of y vectors • • indicates overlapping very sparse vectors • support s < N 1/2 /log( N ) • Use graph algorithms to • very incoherent dictionary • identify single dictionary (much more than RIP) elements at a time number of observations is • much bigger than N Arora, Ge, and Moitra Agarwal, Anandkumar, and Netrapalli

Extended representations linear map C = π ( K ∩ L ) convex body cone affine space

C = π ( K ∩ L ) 1 -1 1 K = S d 1 + d 2 K = R 2 d + + 2 d -1 L = { Z : trace( Z ) = 1 } � L = { y : y i = 1 } �� A �� B i =1 = B π � I − I � B T C π = (-1,1) (1,1) K = S d +1 K = R 2 d + (-1,-1) (1,-1) + � T � � � � � x T toeplitz y i + y i + d = 1 L = Z = : L = y : x T u T 11 = u = 1 1 ≤ i ≤ d �� T �� x � I − I � = x π = π x T u

Extended representations linear map C = π ( K ∩ L ) cone affine space C ∗ = { y : � x, y � � 1 � x � C } polar body C has a lift into K if there are maps Representation B : C ∗ → K ∗ A : C → K learning becomes matrix factorization such that 1 � � x, y � = � A ( x ) , B ( y ) � for all extreme points of x ∈ C and y ∈ C* Gouveia, Parrilo, and Thomas

Latent Structure Beyond Sparse Codes Benjamin Recht Department of - PowerPoint PPT Presentation

Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and Statistics University of California, Berkeley 2.5x Gabor-like thingies... redundancy robustness and sparsity Sparse Codes Which mathematical representations can be

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Building Codes Building Codes Building Codes Building Codes 1 1 Builder Responsibilities

ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CODES FOR ALL SEASONS Emina Soljanin, Bell Labs IN THE CLOUD? CODES Emina @ Bell Labs Codes at

G ENERALIZED R EED -S OLOMON CODES (GRS CODES ) A CHARACTERIZATION OF MDS CODES THAT HAVE AN ERROR

Lattices from Codes or Codes from Lattices Amin Sakzad Dept of Electrical and Computer Systems

Error-Correcting codes: Application of convolutional codes to Video Streaming Diego Napp

Information Theory Lecture 8 BCH codes BCH codes: R8.45 (R5.6) Decoding BCH (and

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Latent Event Structure Atomic Object Structure: Formal Quale (objects expressed as basic nominal

5b. Market Network Codes Capacity Allocation and Congestion Management Forward Capacity

CSS codes Shyam Sundhar R November 12, 2017 Shyam Sundhar R CSS codes November 12, 2017 1 /

On Siamese Association Schemes Martin Ma caj October 4th, 2016 Overview Introduction

Direct Methods for Solving Linear Systems Linear Systems of Equations Numerical Analysis (9th

MA/CSSE 473 Day 20 Finish Josephus Transform and conquer Gaussian Elimination LU-decomposition

Notes Linear Algebra Assignment 1 is out (questions?) Last class: we reduced the problem

Eigencircles of 2 2 matrices Graham Farr Faculty of IT Monash University

( ) = 2 I i + 1 ( ) B X " ; I i # I f | " i > 2 L + 1 & ) ) = L [(2 L + 1)!!] 2

AMath 483/583 Lecture 22 Outline: MPI MasterWorker paradigm Linear algebra

Joint spectral radius Constrained matrix products Victor Kozyakin kozyakin@iitp.ru Institute

Latent Structure Beyond Sparse Codes Benjamin Recht Department of - PowerPoint PPT Presentation

Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and Statistics University of California, Berkeley 2.5x Gabor-like thingies... redundancy robustness and sparsity Sparse Codes Which mathematical representations can be

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Building Codes Building Codes Building Codes Building Codes 1 1 Builder Responsibilities

ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CODES FOR ALL SEASONS Emina Soljanin, Bell Labs IN THE CLOUD? CODES Emina @ Bell Labs Codes at

G ENERALIZED R EED -S OLOMON CODES (GRS CODES ) A CHARACTERIZATION OF MDS CODES THAT HAVE AN ERROR

Lattices from Codes or Codes from Lattices Amin Sakzad Dept of Electrical and Computer Systems

Error-Correcting codes: Application of convolutional codes to Video Streaming Diego Napp

Information Theory Lecture 8 BCH codes BCH codes: R8.45 (R5.6) Decoding BCH (and

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Latent Event Structure Atomic Object Structure: Formal Quale (objects expressed as basic nominal

5b. Market Network Codes Capacity Allocation and Congestion Management Forward Capacity

CSS codes Shyam Sundhar R November 12, 2017 Shyam Sundhar R CSS codes November 12, 2017 1 /

On Siamese Association Schemes Martin Ma caj October 4th, 2016 Overview Introduction

Direct Methods for Solving Linear Systems Linear Systems of Equations Numerical Analysis (9th

MA/CSSE 473 Day 20 Finish Josephus Transform and conquer Gaussian Elimination LU-decomposition

Notes Linear Algebra Assignment 1 is out (questions?) Last class: we reduced the problem

Eigencircles of 2 2 matrices Graham Farr Faculty of IT Monash University

( ) = 2 I i + 1 ( ) B X &quot; ; I i # I f | &quot; i &gt; 2 L + 1 &amp; ) ) = L [(2 L + 1)!!] 2

AMath 483/583 Lecture 22 Outline: MPI MasterWorker paradigm Linear algebra

Joint spectral radius Constrained matrix products Victor Kozyakin kozyakin@iitp.ru Institute

( ) = 2 I i + 1 ( ) B X " ; I i # I f | " i > 2 L + 1 & ) ) = L [(2 L + 1)!!] 2