2010 SIAM International Conference on Data Mining
Mining Sparse Representations: Formulations, Algorithms, and Applications
Jun Liu, Shuiwang Ji, and Jieping Ye
Computer Science and Engineering The Biodesign Institute Arizona State University
1
Formulations, Algorithms, and Applications Jun Liu, Shuiwang Ji, and - - PowerPoint PPT Presentation
2010 SIAM International Conference on Data Mining Mining Sparse Representations: Formulations, Algorithms, and Applications Jun Liu, Shuiwang Ji, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University
2010 SIAM International Conference on Data Mining
1
2010 SIAM International Conference on Data Mining
2
2010 SIAM International Conference on Data Mining
3
SIAM Data Mining 2007 Tutorial (Yu, Ye, and Liu): “Dimensionality Reduction for Data Mining - Techniques, Applications, and Trends”
2010 SIAM International Conference on Data Mining
4
SIAM Data Mining 2007 Tutorial (Yu, Ye, and Liu): “Dimensionality Reduction for Data Mining - Techniques, Applications, and Trends”
2010 SIAM International Conference on Data Mining
5
2010 SIAM International Conference on Data Mining
6
2010 SIAM International Conference on Data Mining
7
2010 SIAM International Conference on Data Mining
n×1 measurements p×1 signal
8
2010 SIAM International Conference on Data Mining
n×1 measurements p×1 signal
9
2010 SIAM International Conference on Data Mining
10
2010 SIAM International Conference on Data Mining
11
2010 SIAM International Conference on Data Mining
12
2010 SIAM International Conference on Data Mining
13
2010 SIAM International Conference on Data Mining
14
2010 SIAM International Conference on Data Mining
15
2010 SIAM International Conference on Data Mining
min loss(x) s.t. ||x||2 ≤1 min 0.5||x-v||2 s.t. ||x||2 ≤1 min loss(x) s.t. ||x||1 ≤1 min 0.5||x-v||2 s.t. ||x||1 ≤1
16
2010 SIAM International Conference on Data Mining
min loss(x) s.t. ||x||1 ≤1 min loss(x) s.t. ||x||2 ≤1
17
2010 SIAM International Conference on Data Mining
18
2010 SIAM International Conference on Data Mining
19
x p×1
× y A n×1 n×p
P0 P1
2010 SIAM International Conference on Data Mining
1 2
K K
20
2010 SIAM International Conference on Data Mining
21
Basis pursuit De-Noising (Chen, Donoho, and Saunders, 1999) Lasso (Tibshirani, 1996) Regularized counterpart of Lasso Dantzig selector (Candes and Tao, 2007)
2010 SIAM International Conference on Data Mining
22
× + y A z n×1 n×p n×1
p×1 y
2010 SIAM International Conference on Data Mining
23
2010 SIAM International Conference on Data Mining
2010 SIAM International Conference on Data Mining
25
test image training images … × …
2010 SIAM International Conference on Data Mining
26
Elucidate a Magnetic Resonance Imaging-Based Neuroanatomic Biomarker for Psychosis
2010 SIAM International Conference on Data Mining
27
2010 SIAM International Conference on Data Mining
28
2010 SIAM International Conference on Data Mining
L1 L1/Lq L1/Lq
29
q norm q norm q norm 1 norm
,1
i
G q q i
2010 SIAM International Conference on Data Mining
30
2010 SIAM International Conference on Data Mining
31
1 1 1 1
2010 SIAM International Conference on Data Mining
– BDGP (1-3, 4-6, 7-8, 9-10, 11-12, 13-) – Fly-FISH (1-3, 4-5, 6-7, 8-9, 10-)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
25 40 15 50 40 10 10 30 40 60 120 120 60 60 100 360 720 32
2010 SIAM International Conference on Data Mining
Group i Group j
33
2010 SIAM International Conference on Data Mining
34
2010 SIAM International Conference on Data Mining
35
2010 SIAM International Conference on Data Mining
36
The letter „a’ written by 40 different people Letter data set: 1) The letters are from more than 180 different writers 2) It has 8 tasks for discriminating letter c/e, g/y, g/s, m/n, a/g, i,/j, a/o. f/t, and h/n
2010 SIAM International Conference on Data Mining
Samples of the letters s and g for one writer
37
2010 SIAM International Conference on Data Mining
38
2010 SIAM International Conference on Data Mining
2010 SIAM International Conference on Data Mining
40
2010 SIAM International Conference on Data Mining
(Tibshirani et al., 2005; Tibshirani and Wang, 2008; Friedman et al., 2007) Fused Lasso L1
41
2010 SIAM International Conference on Data Mining
42
2010 SIAM International Conference on Data Mining
(Tibshirani and Wang, 2008)
43
2010 SIAM International Conference on Data Mining
44
2010 SIAM International Conference on Data Mining
45
2010 SIAM International Conference on Data Mining
46
2010 SIAM International Conference on Data Mining
47
Undirected graphical model (Markov Random Field)
2010 SIAM International Conference on Data Mining
48
2010 SIAM International Conference on Data Mining
49
2010 SIAM International Conference on Data Mining
1
k
2
k
k
1
2
2 1
2 1
k k
50
2010 SIAM International Conference on Data Mining
Small λ Large λ
λ3 λ2 λ1
51
2010 SIAM International Conference on Data Mining
(Banerjee et al., 2008)
52
2010 SIAM International Conference on Data Mining
(Banerjee et al., 2008)
53
2010 SIAM International Conference on Data Mining
54
2010 SIAM International Conference on Data Mining
1 Frontal_Sup_L 13 Parietal_Sup_L 21 Occipital_Sup_L 27 Temporal_Sup_L 2 Frontal_Sup_R 14 Parietal_Sup_R 22 Occipital_Sup_R 28 Temporal_Sup_R 3 Frontal_Mid_L 15 Parietal_Inf_L 23 Occipital_Mid_L 29 Temporal_Pole_Sup_L 4 Frontal_Mid_R 16 Parietal_Inf_R 24 Occipital_Mid_R 30 Temporal_Pole_Sup_R 5 Frontal_Sup_Medial_L 17 Precuneus_L 25 Occipital_Inf_L 31 Temporal_Mid_L 6 Frontal_Sup_Medial_R 18 Precuneus_R 26 Occipital_Inf_R 32 Temporal_Mid_R 7 Frontal_Mid_Orb_L 19 Cingulum_Post_L 33 Temporal_Pole_Mid_L 8 Frontal_Mid_Orb_R 20 Cingulum_Post_R 34 Temporal_Pole_Mid_R 9 Rectus_L 35 Temporal_Inf_L 8301 10 Rectus_R 36 Temporal_Inf_R 8302 11 Cingulum_Ant_L 37 Fusiform_L 12 Cingulum_Ant_R 38 Fusiform_R 39 Hippocampus_L 40 Hippocampus_R 41 ParaHippocampal_L 42 ParaHippocampal_R
55
2010 SIAM International Conference on Data Mining
frontal, parietal, occipital, and temporal lobes in order
56
2010 SIAM International Conference on Data Mining
frontal, parietal, occipital, and temporal lobes in order
57
2010 SIAM International Conference on Data Mining
58
2010 SIAM International Conference on Data Mining
Customers Items
59
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
2010 SIAM International Conference on Data Mining
Users Movies
60
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
2010 SIAM International Conference on Data Mining
rank
61
2010 SIAM International Conference on Data Mining
62
2010 SIAM International Conference on Data Mining
63
2010 SIAM International Conference on Data Mining
64
2010 SIAM International Conference on Data Mining
65
2010 SIAM International Conference on Data Mining
* *
66
2010 SIAM International Conference on Data Mining
2 1 , ,
2 1
A A X
2 1
T
*
W
67
2010 SIAM International Conference on Data Mining
68
2010 SIAM International Conference on Data Mining
69
2010 SIAM International Conference on Data Mining
70
2010 SIAM International Conference on Data Mining
min f(x)= loss(x) + λ×penalty(x)
71
2010 SIAM International Conference on Data Mining
min f(x)= loss(x) + λ×penalty(x)
72
2010 SIAM International Conference on Data Mining
73
2010 SIAM International Conference on Data Mining
74
2010 SIAM International Conference on Data Mining
75
2010 SIAM International Conference on Data Mining
76
2010 SIAM International Conference on Data Mining
77
2010 SIAM International Conference on Data Mining
78
2010 SIAM International Conference on Data Mining
79
2010 SIAM International Conference on Data Mining
min f(x)= loss(x) + λ×penalty(x) penalty(x)=||x||1
80
2010 SIAM International Conference on Data Mining
2
81
2010 SIAM International Conference on Data Mining
(Yuan and Lin, 2006, Liu et al., 2009; Argyriou et al., 2008; Meier et al., 2008)
(Banerjee et al., 2008; Friedman et al., 2007)
(Friedman et al., 2007; Hofling, 2010)
82
2010 SIAM International Conference on Data Mining
83
2010 SIAM International Conference on Data Mining
84
Repeat Until “convergence”
2010 SIAM International Conference on Data Mining
85
Repeat Until “convergence”
1/2,
i
2010 SIAM International Conference on Data Mining
86
2010 SIAM International Conference on Data Mining
87
2010 SIAM International Conference on Data Mining
88
2010 SIAM International Conference on Data Mining
89
2010 SIAM International Conference on Data Mining
90
2010 SIAM International Conference on Data Mining
91
2010 SIAM International Conference on Data Mining
(Nesterov, 1983; Nemirovski, 1994; Nesterov, 2004)
GD O(1/N)
AGD O(1/N2)
92
2010 SIAM International Conference on Data Mining
GD O(1/N) min f(x)= loss(x) + λ×penalty(x)
AGD O(1/N2)
93
2010 SIAM International Conference on Data Mining
(Nesterov, 2007; Beck and Teboulle, 2009)
(Liu, Ji, and Ye, 2009; Liu and Ye, 2010)
(Ji and Ye, 2009; Pong et al., 2009; Toh and Yun, 2009; Lu et al., 2009)
(Liu, Yuan, and Ye, 2010)
94
2010 SIAM International Conference on Data Mining
95
2010 SIAM International Conference on Data Mining
96
1
x
2010 SIAM International Conference on Data Mining
97
2010 SIAM International Conference on Data Mining
98
2010 SIAM International Conference on Data Mining
99
2010 SIAM International Conference on Data Mining
100
2010 SIAM International Conference on Data Mining
101
2010 SIAM International Conference on Data Mining
102
2010 SIAM International Conference on Data Mining
103
2010 SIAM International Conference on Data Mining
104
2010 SIAM International Conference on Data Mining
105
2010 SIAM International Conference on Data Mining
106
2010 SIAM International Conference on Data Mining
107
2010 SIAM International Conference on Data Mining
108
2010 SIAM International Conference on Data Mining
109
2010 SIAM International Conference on Data Mining
min f(x)= loss(x) + λ×penalty(x) Sparse Fused Lasso Sparse inverse covariance
110
2010 SIAM International Conference on Data Mining
111
2010 SIAM International Conference on Data Mining
min f(x)= loss(x) + λ×penalty(x) Sparse Sparse group Lasso Group Lasso
112
2010 SIAM International Conference on Data Mining
G1 1:15
G3 7:11 G2 1:6 G4 12:15 G5 1:3 G6 4:6 G8 9:11 G7 7:8 113
2010 SIAM International Conference on Data Mining
114
× + y A z n×1 n×p n×1 p×1 x
2010 SIAM International Conference on Data Mining
115
2010 SIAM International Conference on Data Mining
116
2010 SIAM International Conference on Data Mining
117
2010 SIAM International Conference on Data Mining
118
2010 SIAM International Conference on Data Mining
119
2010 SIAM International Conference on Data Mining
120
2010 SIAM International Conference on Data Mining
121
2010 SIAM International Conference on Data Mining
122
2010 SIAM International Conference on Data Mining
123
2010 SIAM International Conference on Data Mining
124
2010 SIAM International Conference on Data Mining
125
2010 SIAM International Conference on Data Mining
126
2010 SIAM International Conference on Data Mining
127
2010 SIAM International Conference on Data Mining
128
2010 SIAM International Conference on Data Mining
129
2010 SIAM International Conference on Data Mining
130
2010 SIAM International Conference on Data Mining
131
2010 SIAM International Conference on Data Mining
132
2010 SIAM International Conference on Data Mining
133