A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem
Bharath K. Sriperumbudur, David A. Torres and Gert R. G. Lanckriet
University of California, San Diego
A D.C. Programming Approach to the Sparse Generalized Eigenvalue - - PowerPoint PPT Presentation
A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem Bharath K. Sriperumbudur, David A. Torres and Gert R. G. Lanckriet University of California, San Diego OPT 2009 Generalized Eigenvalue Problem Given a matrix pair, ( A , B
University of California, San Diego
x
++. ◮ Popular in multivariate statistics and machine learning.
◮ Classification : Fisher discriminant analysis ◮ Dimensionality reduction : Principal component analysis, Canonical
◮ Clustering : Spectral clustering
◮ Fisher Discriminant Analysis (FDA)
◮ A = (µ1 − µ2)(µ1 − µ2)T is the between-cluster variance. ◮ B = Σ1 + Σ2 is the within-cluster variance.
◮ Principal Component Analysis (PCA)
◮ A = Σ is the covariance matrix. ◮ B is the identity matrix.
◮ Canonical Correlation Analysis (CCA)
◮ A =
◮ B =
◮ Usually, the solutions of FDA, PCA and CCA are not sparse. ◮ This often makes it difficult to interpret the results. ◮ PCA/CCA: For better interpretability, few relevant features are
◮ Applications: bio-informatics, finance, document translation etc.
◮ FDA: feature selection aids generalization performance by promoting
◮ Sparse representation ⇒ better interpretation, better generalization
◮ The variational formulation for the sparse generalized eigenvalue
x
i=1 1{|xi|=0} is the cardinality of x. ◮ (2) is non-convex, NP-hard and therefore intractable. ◮ Usually, the ℓ1-norm approximation is used for the cardinality
◮ The problem is still computationally hard.
◮ (2) can be written as
x
◮ Approximate ||x||0 by xε := n i=1 log(1+|xi|ε−1) log(1+ε−1)
ε→0 n
◮ The approximation, xε can be interpreted as defining a limiting
n
−3 −2 −1 1 2 3 0.5 1 1.5 2 2.5 3 x log(1+|x|ε−1)/log(1+ε−1) ||x||0 ||x||1 ε=1 ε=1e−2 ε=1e−5 ε=1e−10
◮ (3) reduces to the approximate program,
x
n
˜ ρ log(1+ε−1). ◮ The task reduces to solving the approximate program in (5) with a
◮ (5) can be written as
x
n
◮ The objective in (6) is a difference of two convex functions.
◮ Suppose we want to minimize f over Ω ⊂ Rn. Construct a
◮ The majorization algorithm corresponding to g updates x at
x∈Ω g(x, x(l)),
x∈Ω g(x, x(l)),
◮ f (x(l+1)) ≤ g(x(l+1), x(l)) ≤ g(x(l), x(l)) = f (x(l)). ◮ MM algorithms can be thought of as a generalization of the EM
2 − 2xT(A + τIn)y + yT(A + τIn)y + ρε n
n
x
2 − 2xT(A + τIn)x(l) + ρε n
i | + ε
◮ (9) can also be written as
x
2 + ρ
i
1 |x(l)
i
|+ε, w(l) := (w (l) 1 , . . . , w (l) n ) and
◮ (10) is very similar to LASSO [Tibshirani, 1996] except for the
◮ When A 0, B = In and τ = 0, (9) reduces to a very simple
i
2 w (l) i
i=1
2 w (l) i
+
l=0 be any sequence generated by the sparse GEV algorithm in
l=0 are stationary points of the
n
i |)−[x(l)]TAx(l) → ρε n
i |)−[x∗]TAx∗ := L∗,
l=0
l=0 is a connected and
n
l=0 generated by (9) converges to some x∗ in S (L∗).
l=0 generated
◮ Local and global solutions are the same for ρ = 0.
l=0 generated by
◮ With B = In, (12) reduces to the power method for computing
◮ Sparse PCA algorithms: Proposed (DC-PCA), SDP relaxation
◮ Pit props data [Jeffers, 1967]
◮ A benchmark data to test sparse PCA algorithms. ◮ 180 observations and 13 measured variables. ◮ 6 principal directions are considered as they capture 87% of the total
1 2 3 4 5 6 30 40 50 60 70 80 Number of principal components Cumulative Variance (%) SPCA DSPCA GSPCA GPower0 DC−PCA
1 2 3 4 5 6 4 6 8 10 12 14 16 18 Number of principal components Cumulative cardinality SPCA DSPCA GSPCA GPower0 DC−PCA
2 4 6 8 10 12 0.2 0.4 0.6 0.8 1 Cardinality Proportion of explained variance SPCA DSPCA GSPCA GPower0 DC−PCA
9 18 27 36 45 0.5 1 Proportion of explained variance 9 18 27 36 45 5 10 ρ~ Cardinality
500 1000 1500 2000 0.2 0.4 0.6 0.8 1 Cardinality Proportion of explained variance GPower0 DC−PCA SPCA
2000 4000 6000 0.2 0.4 0.6 0.8 1 Cardinality Proportion of explained variance GPower0 DC−PCA SPCA
5000 10000 15000 0.2 0.4 0.6 0.8 1 Cardinality Proportion of explained variance GPower0 DC−PCA SPCA
◮ Complexity
◮ DC-PCA, GPowerℓ0 : O(mn2), where m is the number of iterations
◮ SPCA : O(mn3) ◮ GSPCA : O(n4) ◮ DSPCA : O(n4√log n)
◮ Randomly chosen problems of size n ranging from 10 to 10000. ◮ Linux 3 GHz, 4 GB RAM workstation.
10
1
10
2
10
3
10
4
10
−4
10
−2
10 10
2
10
4
n Computational time (seconds)
SPCA DSPCA GSPCA GPower0 DC−PCA
◮
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues. Cell Biology, 96:6745–6750.
◮
d’Aspremont, A., El Ghaoui, L., Jordan, M. I., and Lanckriet, G. R. G. (2005). A direct formulation for sparse PCA using semidefinite programming. In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 41–48, Cambridge,
◮
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M. K., Downing, J. R., Caligiuri,
Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537.
◮
Jeffers, J. (1967). Two case studies in the application of principal components. Applied Statistics, 16:225–236.
◮
Journ´ ee, M., Nesterov, Y., Richt´ arik, P., and Sepulchre, R. (2008). Generalized power method for sparse principal component analysis. http://arxiv.org/abs/0811.4724v1.
◮
Moghaddam, B., Weiss, Y., and Avidan, S. (2007). Spectral bounds for sparse PCA: Exact and greedy algorithms. In Sch¨
Press.
◮
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signature. Proceedings of the National Academy of Sciences, 98:15149–15154.
◮
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of Royal Statistical Society, Series B, 58(1):267–288.
◮
Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15:265–286.