Center for Evolutionary Medicine and Informatics
Sparse Screening for Exact Data Reduction
Jieping Ye
Arizona State University
1
Joint work with Jie Wang and Jun Liu
Data Reduction Jieping Ye Arizona State University Joint work with - - PowerPoint PPT Presentation
Center for Evolutionary Medicine and Informatics Sparse Screening for Exact Data Reduction Jieping Ye Arizona State University Joint work with Jie Wang and Jun Liu 1 Center for Evolutionary Medicine and Informatics wide data feature
Center for Evolutionary Medicine and Informatics
Arizona State University
1
Joint work with Jie Wang and Jun Liu
Center for Evolutionary Medicine and Informatics
2
wide data tall data sample reduction feature reduction
Center for Evolutionary Medicine and Informatics
3
Center for Evolutionary Medicine and Informatics
4
Center for Evolutionary Medicine and Informatics
(Tibshirani, 1996, Chen, Donoho, and Saunders, 1999)
× + y A z n×1 n×p n×1 p×1 x
5
Center for Evolutionary Medicine and Informatics
(Thompson et al. 2013)
6
Center for Evolutionary Medicine and Informatics
7
Vounou et al. (2010, 2012)
Center for Evolutionary Medicine and Informatics
8
Group Lasso Tree Lasso Fused Lasso Graph Lasso
Center for Evolutionary Medicine and Informatics
9
Center for Evolutionary Medicine and Informatics
10
Center for Evolutionary Medicine and Informatics
Lasso Fused Lasso Group Lasso Sparse Group Lasso Tree Structured Group Lasso Overlapping Group Lasso Sparse Inverse Covariance Estimation Trace Norm Minimization http://www.public.asu.edu/~jye02/Software/SLEP/
11
Center for Evolutionary Medicine and Informatics
12
Center for Evolutionary Medicine and Informatics
13
Center for Evolutionary Medicine and Informatics
14
1M 1K
Center for Evolutionary Medicine and Informatics
– Sure screening, random projection/selection – Resulting model is an approximation of the true model
– Exact data reduction via sparse screening
15
Center for Evolutionary Medicine and Informatics
16
with screening same solution 1M 1M 1K without screening
Center for Evolutionary Medicine and Informatics
Center for Evolutionary Medicine and Informatics
Ghaoui, Viallon, and Rabbani.
Center for Evolutionary Medicine and Informatics
Center for Evolutionary Medicine and Informatics
20
Center for Evolutionary Medicine and Informatics
21
Center for Evolutionary Medicine and Informatics
22
Center for Evolutionary Medicine and Informatics
Non-expansiveness:
Center for Evolutionary Medicine and Informatics
24
Use projections of rays: Define: Enhanced DPP:
Center for Evolutionary Medicine and Informatics
25
Non-expansiveness: Firmly non-expansiveness:
Center for Evolutionary Medicine and Informatics
26
Results on MNIST along a sequence of 100 parameter values along the λ/λmax scale from 0.05 to 1. The data matrix is of size 784x50,000
Center for Evolutionary Medicine and Informatics
27
Center for Evolutionary Medicine and Informatics
– The size of the data matrix is 747 by 504095
Method ROI3 ROI8 ROI30 ROI69 ROI76 ROI83 Lasso Solver 37975.31 37097.25 38258.72 36926.81 38116.29 37251.03 SR 84.06 84.44 84.70 83.09 82.76 85.39 SR+Lasso 217.08 215.90 223.39 214.36 212.04 211.57 EDDP 43.56 45.75 45.70 45.01 44.31 44.16 EDDP+Lasso 183.64 190.43 182.87 170.71 177.41 178.98 Running time (in seconds) of the Lasso solver, strong rule (Tibshriani et al, 2012), and
100 log 0.95 to log 0.95.
Center for Evolutionary Medicine and Informatics
– J Wang, J Liu, J Ye. Efficient Mixed-Norm Regularization: Algorithms and Safe Screening Methods. arXiv preprint arXiv:1307.4156.
– J Wang, J Zhou, P Wonka, J Ye. A Safe Screening Rule for Sparse Logistic
– S Huang, J Li, L Sun, J Liu, T Wu, K Chen, A Fleisher, E Reiman, J Ye. Learning brain connectivity of Alzheimer’s disease by exploratory graphical models. NeuroImage 50, 935-949. – Witten, Friedman and Simon (2011), Mazumder and Hastie (2012)
– S Yang, Z Pan, X Shen, P Wonka, J Ye. Fused Multiple Graphical Lasso. arXiv preprint arXiv:1209.2139.
29
Center for Evolutionary Medicine and Informatics
30
wide data tall data
Center for Evolutionary Medicine and Informatics
31
denotes +1 denotes -1 Margin
Center for Evolutionary Medicine and Informatics
32
Support Vectors are those data points that the margin pushes up against denotes +1 denotes -1
The non-support vectors are irrelevant to the classifier. Can we make use of this
Center for Evolutionary Medicine and Informatics
33
Original Problem Screening
Smaller Problem to Solve
Center for Evolutionary Medicine and Informatics
34
Center for Evolutionary Medicine and Informatics
35
Center for Evolutionary Medicine and Informatics
36
Center for Evolutionary Medicine and Informatics
37
the number of data instances whose membership can be identified by the rule to the total number of data instances.
Center for Evolutionary Medicine and Informatics
38
Comparison of SSNSV (Ogawa et al., ICML’13), ESSNSV and DVIs for SVM on three real data sets.
IJCNN, , Speedup Solver Total 4669.14 Solver + SSNSV SSNSV 2.08
2.31
Init. 92.45 Total 2018.55 Solver + ESSNS V ESSNSV 2.09
3.01
Init. 91.33 Total 1552.72 Solver + DVI DVI 0.99
5.64
Init. 42.67 Total 828.02 Wine, , Speedup Solver Total 76.52 Solver + SSNSV SSNSV 0.02
3.50
Init. 1.56 Total 21.85 Solver + ESSNS V ESSNSV 0.03
4.47
Init. 1.60 Total 17.17 Solver + DVI DVI 0.01
6.59
Init. 0.67 Total 11.62 Covertype, , Speedup Solver Total 1675.46 Solver + SSNSV SSNSV 2.73
7.60
Init. 35.52 Total 220.58 Solver + ESSNS V ESSNSV 2.89
10.72
Init. 36.13 Total 156.23 Solver + DVI DVI 1.27
79.18
Init. 12.57 Total 21.26
Center for Evolutionary Medicine and Informatics
39
Comparison of SSNSV (Ogawa et al., ICML’13), ESSNSV and DVIs for LAD on three real data sets.
Telescope, , Speedup Solver Total 122.34 Solver + DVI DVI 0.28
9.86
Init. 0.12 Total 12.14 Computer, , Speedup Solver Total 5.85 Solver + DVI DVI 0.08
19.21
Init. 0.05 Total 0.28 Telescope, , Speedup Solver Total 21.43 Solver + DVI DVI 0.06
114.91
Init. 0.1 Total 0.19
Center for Evolutionary Medicine and Informatics
40
implementation instructions, illustration materials, etc.
http://www.public.asu.edu/~jwang237/screening.html
Seven lines implementation
The list is growing quickly
Center for Evolutionary Medicine and Informatics
formulations
– Sparsity on features, samples, networks etc
41