Latent Structure Beyond Sparse Codes
Benjamin Recht Department of EECS and Statistics University of California, Berkeley
Latent Structure Beyond Sparse Codes Benjamin Recht Department of - - PowerPoint PPT Presentation
Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and Statistics University of California, Berkeley 2.5x Gabor-like thingies... redundancy robustness and sparsity Sparse Codes Which mathematical representations can be
Benjamin Recht Department of EECS and Statistics University of California, Berkeley
2.5x
redundancy Which mathematical representations can be learned robustly? robustness and sparsity Gabor-like thingies...
Sparse Approximation
are sparse in wavelet basis to reduce number of measurements required for signal acquisition.
Compressed Sensing
diagnosis, search for a sparse set of markers
Lasso
Cardinality Minimization
satisfies/approximates the underdetermined linear system
– Reduce to EXACT-COVER – Hard to approximate – Known exact algorithms require enumeration
Φx = y Φ : Rp → Rn
Density Matrix Seismic Imaging Geometric Structure Rank of: Recommender Systems Data Matrix Quantum Tomography Rank of: Rank of: Rank of: Unfolded Tensor Gram Matrix
Affine Rank Minimization
satisfies/approximates the underdetermined linear system
– Reduce to solving polynomial equations – Hard to approximate – Exact algorithms are awful
Φ(X) = y Φ : Rp1×p2 → Rn
corrected model (𝛽,β>0): r x p2
= M L R*
p1 x r p1 x p2
minimize kXk∗ subject to Φ(X) = b
IDEA: Replace rank with nuclear norm: Some guy on livejournal, 2006 Fazel, Parillo, Recht, 2007 Candes and Recht, 2008
Succeeds when number of samples is Õ(r(p1 +p2))
e = (LiRT
j − Mij)
Li Rj
αLi − βeRj αRj − βeLi
System Identification: find a dynamical model that agrees with time series data
Observe a time series driven by the input
y1, y2, . . . , yT u1, u2, . . . uT
What is a principled way to build a parsimonious model for the input-output responses?
Na et al, 2012 Shah, Bhaskar, Tang, and Recht 2012
should we pick?
underdetermined systems problems with priors?
Sparsity Rank Smoothness Symmetry
kxk1 =
p
X
i=1
|xi|
Euclidean norm 1
unit ball of the l1 norm
1 1
minimize kxk1 subject to Φx = y
x1 x2 Φx=y
Compressed Sensing: Candes, Romberg, Tao, Donoho, Tanner, Etc...
rank 1 x2 + z2 + 2y2 = 1
rank 1 x2 + z2 + 2y2 = 1 Convex hull:
kXk∗ = X
i
σi(X)
Nuclear Norm Heuristic Fazel 2002. R, Fazel, and Parillo 2007 Rank Minimization/Matrix Completion
kXk∗ = X
i
σi(X)
all components of x are ±1
unit ball of the l1 norm
(1,-1) (1,1) (-1,-1) (-1,1)
minimize kxk∞ subject to Φx = y
x1 x2 Φx=y
Donoho and Tanner 2008 Mangasarian and Recht. 2009.
atoms model weights rank
kxkA = inf{ X
a∈A
|ca| : x = X
a∈A
caa} kxkA = inf{t > 0 : x 2 tconv(A)} A minimize kzkA subject to Φz = y
IDEA:
A
from a set of subspaces {Ug}.
t∈T, some basic set.
Numerical Integration, Statistical Inference
Networks, Combinatorial Approximation Algorithms
programming
Hyperspectral Imaging, Neuroscience
squares
convex hull of A.
kf fnkL2 c0kfkA pn
form a cone:
cone with the null space of Φ ¡equals {0}
y = Φz x minimize kzkA subject to Φz = y
{z : kzkA kxkA}
TA(x)
TA(x) = {d : kx + αdkA kxkA for some α > 0}
d0x SC(d) = sup
x2C
d0x −d0x
Support Function:
SC(d) + SC(−d)
measures width of C when projected onto span of d. mean width: w(C) =
Z
Sp−1 SC(u)du
convex cone C at the origin?
where is the mean width.
Gaussian matrix with n rows, need for exact recovery of x.
codim(U) ≥ p w(C ∩ Sp−1)2
w(C ∩ Sp−1) = Z
Sp−1 SC(u)du
n ≥ p w(TA(x) ∩ Sp−1)2 Rp
maximum group size B, k active groups
n ≥ p/2 n ≥ 2s log p
s
4
n ≥ k ⇣p 2 log (M − k) + √ B ⌘2 + kB n ≥ 3r(p1 + p2 − r)
provided that
Robust Recovery (deterministic)
minimize kzkA subject to kΦz yk δ kwk2 δ kx ˆ xk 2 ✏ ˆ x y = Φx + w
{z : kzkA kxkA}
kΦz yk δ n ≥ pw(TA(x) ∩ Sp−1)2 (1 − ✏)2
provided that
Robust Recovery (statistical)
ˆ x y = Φx + w
ˆ x
minimize kΦz yk2 + µkzkA
cone{u : kx + ukA kxkA + γkuk}
kx ˆ xk2 η(x, A, Φ, γ)µ
And under an additional “cone condition”
Bhaskar, Tang, and Recht 2011
µ Ew[kΦ∗wk∗
A]
kΦx Φˆ xk2 p µkxkA
1 pkˆ x x?k2
2 = O
✓σ2s log(p) p ◆
1 p1p2 kˆ x x?k2
F = O
✓σ2r p1 ◆
algorithms
needed for model recovery in all common models
applications
minimize kzkA subject to Φz = y
IDEA:
Chandrasekaran, Recht, Parrilo, and Willsky 2010
indicates overlapping support
identify single dictionary elements at a time
(much more than RIP)
much bigger than N
Arora, Ge, and Moitra Agarwal, Anandkumar, and Netrapalli
x z
|Φx, Φz| |x, z|
convex body linear map cone affine space
(1,-1) (1,1) (-1,-1) (-1,1)
1 1
π = I −I L = {y :
2d
yi = 1} L = {Z : trace(Z) = 1} π A B BT C
π T x xT u
L =
yi + yi+d = 1 1 ≤ i ≤ d
I −I
L =
T x xT u
T toeplitz T11 = u = 1
+
K = Sd1+d2
+
K = Sd+1
+
K = R2d
+
linear map cone affine space
C∗ = {y : x, y 1 x C} 1 x, y = A(x), B(y) A : C → K B : C∗ → K∗
C has a lift into K if there are maps such that for all extreme points of x ∈ C and y ∈ C* polar body
Gouveia, Parrilo, and Thomas
Representation learning becomes matrix factorization
convex body linear map cone affine space