Sparsity in Learning
- Y. Grandvalet
Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit - - PowerPoint PPT Presentation
Sparsity in Learning Y. Grandvalet Heudiasyc, C NRS & Universit e de Technologie de Compi` egne Statistical Learning Parsimony Variable Space Example Space Conclusions Statistical Learning Regression Classification
Statistical Learning Parsimony Variable Space Example Space Conclusions
Statlearn’11 Sparsity in Learning
2
Statistical Learning Parsimony Variable Space Example Space Conclusions
i=1 adjust
Statlearn’11 Sparsity in Learning
3
Statistical Learning Parsimony Variable Space Example Space Conclusions
f∈Fλ
λ by estimating the expected loss of
Statlearn’11 Sparsity in Learning
4
Statistical Learning Parsimony Variable Space Example Space Conclusions
λ
λ
Statlearn’11 Sparsity in Learning
5
Statistical Learning Parsimony Variable Space Example Space Conclusions
Statlearn’11 Sparsity in Learning
6
Statistical Learning Parsimony Variable Space Example Space Conclusions
1
i
n
Statlearn’11 Sparsity in Learning
8
Statistical Learning Parsimony Variable Space Example Space Conclusions
❍ Penalize to stabilize ❍ Parsimony is sometimes a “reasonable prior”
❍ Iteratively solve problem of increasing size ❍ Exact regularization paths ❍ Fast evaluation
❍ Understanding the underlying phenomenon ❍ Acceptability Statlearn’11 Sparsity in Learning
10
Statistical Learning Parsimony Variable Space Example Space Conclusions
❍ Variables “filtered” by a criterion (Fisher, Wilks, mutual information) ❍ Learning proceeds after the treatement
❍ Heuristic search of subsets of variables ❍ Subset selection is determined by the learning algorithm performance ❍ no feedback
❍ Variable selection mechanism incorporated in the learning algorithm ❍ All variables processed during learning, some will not influence the
Statlearn’11 Sparsity in Learning
11
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
d
β
n
Statlearn’11 Sparsity in Learning
12
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
β
n
Statlearn’11 Sparsity in Learning
13
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
βOLS βRR β1 β2
j=1 |βj|2
βOLS βL β1 β2
j=1 |βj|
βOLS βL βL1/2 β1 β2
j=1 |βj|1/2 Statlearn’11 Sparsity in Learning
14
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
1 2 1 2
β
n
d
β,s
n
d
j
j=1 sj ≤ 1 , sj ≥ 0 j = 1, . . . , d
Statlearn’11 Sparsity in Learning
15
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
β1,β2 L(β1, β2) − λΩ(β1, β2)
β1,β2
Statlearn’11 Sparsity in Learning
16
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
Statlearn’11 Sparsity in Learning
17
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
Statlearn’11 Sparsity in Learning
18
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
Surprise Surprise Anger Anger Sadness Sadness Happiness Happiness Fear Fear Disgust Disgust Surprise Surprise Anger Anger Sadness Sadness Happiness Happiness Fear Fear Disgust Disgust Surprise Sadness Happiness Surprise Sadness Happiness
Statlearn’11 Sparsity in Learning
19
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
2 4 6
Statlearn’11 Sparsity in Learning
20
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
❍ Adaptive metric ⇒ 1 or 2 hyper-parameters (compared to d) ❍ Ease to implementation, interpretability
❍ Adaptive metric: “learn the kernel” ⇒ 1 hyper-parameter ❍ CKL takes into account a group structure on kernels
❍ Multi-task learning for pathway inference (Chiquet et al., 2010) ❍ Prediction from cooperative features (Chiquet et al., 2011) Statlearn’11 Sparsity in Learning
21
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
β
n
K
j∈Gk
j
1/2
k=1 forms a partion of {1, . . . , d}
Statlearn’11 Sparsity in Learning
22
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
β
n
K
j∈Gk
+
1/2
j∈Gk
−
1/2
k=1 forms a partion of {1, . . . , d}
Statlearn’11 Sparsity in Learning
23
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
2
j∈Gk
j
1/2
1 1 −1 −1
β1 β3
1 1 −1 −1
β1 β3
1 1 −1 −1
β1 β3
1 1 −1 −1
β1 β3 β2 = 0 β2 = 0.3 β4 = 0 β4 = 0.3 Statlearn’11 Sparsity in Learning
24
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
2
j∈Gk
+
1/2
j∈Gk
−
1/2
1 1 −1 −1
β1 β3
1 1 −1 −1
β1 β3
1 1 −1 −1
β1 β3
1 1 −1 −1
β1 β3 β2 = 0 β2 = 0.3 β4 = 0 β4 = 0.3 Statlearn’11 Sparsity in Learning
25
Statistical Learning Parsimony Variable Space Example Space Conclusions
Embedded LASSO Geometric Insights Examples Ball crafting Coop-Lasso
2 4 6
6 Statlearn’11 Sparsity in Learning
26
Statistical Learning Parsimony Variable Space Example Space Conclusions
2.
❍ A gradient step is O(nd) ❍ A second order step is in O(nd2 + d3) and requires O(d2) of memory ❍ For kernel methods n = d . . .
❍ For kernel methods O(n) per test example
Statlearn’11 Sparsity in Learning
28
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
x2 x1
class+1
class−1 Statlearn’11 Sparsity in Learning
29
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
x2 x1
2 fH
f,b
H+C n
Statlearn’11 Sparsity in Learning
30
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
x2 x1
f,b
H+C n
Statlearn’11 Sparsity in Learning
31
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
f,b,ξ
H+C n
f,b,ξ
H + C n
i
Statlearn’11 Sparsity in Learning
32
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
x2 x1
f,b
H
n
Statlearn’11 Sparsity in Learning
33
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
Statlearn’11 Sparsity in Learning
34
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
+
2 for ℓ(f(x), y) = [1 − y(f(x) + b)]+
+
Statlearn’11 Sparsity in Learning
35
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
Statlearn’11 Sparsity in Learning
36
Statistical Learning Parsimony Variable Space Example Space Conclusions
SVMs Hinge loss Loss crafting
❍ Estimates P(Y = 1|X = x) in [π−, π+] ❍ ⇒ Estimation of gray zones ❍ ⇒ Binary classifiers for Muli-class classification
❍ P(Y = 1|X = x) at {π0} ❍ ⇒ unbalanced classification losses
❍ P(Y = 1|X = x) at {π−, π+} ❍ ⇒ Reject option Statlearn’11 Sparsity in Learning
37
Statistical Learning Parsimony Variable Space Example Space Conclusions
❍ Start from a NP-hard problem ❍ Relax to the convexity limit
❍ Convex ❍ Non-smooth ❍ Piecewise linear
❍ Active sets ❍ Fast Iterative Shrinkage/Threshold Algorithm Statlearn’11 Sparsity in Learning
39
Statistical Learning Parsimony Variable Space Example Space Conclusions
❍ Lagrange parameter (C, λ) ❍ Number of non-zero “slack variables” (ξi, βj) ❍ Magnitude of parameters (fH,
j |βj|)
❍ Fit (
i ξi, i ℓ(f(xi), yi))
❍ Prevailing concensus: convex methods are stable, combinatoric are
❍ What about non-convex losses/penalties such as Ψ-learning, adaptive
Statlearn’11 Sparsity in Learning
40