High-dimensional statistics: Some progress and challenges ahead
Martin Wainwright
UC Berkeley Departments of Statistics, and EECS
High-dimensional statistics: Some progress and challenges ahead - - PowerPoint PPT Presentation
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 3 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh,
UC Berkeley Departments of Statistics, and EECS
p
p
◮ without any structure: sample size n ≍
p
◮ without any structure: sample size n ≍
◮ with sparsity s ≪ p: sample size n ≍
p
◮ without any structure: sample size n ≍
◮ with sparsity s ≪ p: sample size n ≍
(UC Berkeley) High-dimensional statistics February, 2013 4 / 27
−1.5 −1 −0.5 0.5 1 1.5 −0.1 −0.05 0.05 0.1 −1.5 −1 −0.5 0.5 1 1.5
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 4 / 27
k
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 4 / 27
1 Sparse additive models
◮ formulation, applications ◮ families of estimators ◮ efficient implementation as SOCP
2 Statistical rates
◮ Kernel complexity ◮ Subset selection plus univariate function estimation
3 Minimax lower bounds
◮ Statistics as channel coding ◮ Metric entropy and lower bounds Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 5 / 27
j=1 fj(xj)
(Stone, 1985; Hastie & Tibshirani, 1990)
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 6 / 27
j=1 fj(xj)
(Stone, 1985; Hastie & Tibshirani, 1990)
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 6 / 27
j=1 fj(xj)
(Stone, 1985; Hastie & Tibshirani, 1990)
◮ Lin & Zhang, 2006: COSSO relaxation, ◮ Ravikumar et al., 2007: SPAM back-fitting procedure, consistency ◮ Bach et al., 2008: multiple kernel learning (MLK), consistency in classical
◮ Meier et al., 2007, L2(Pn) regularization ◮ Koltchinski & Yuan, 2008, 2010. ◮ Raskutti, W. & Yu, 2009: minimax lower bounds Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 6 / 27
(Besag, 1974; Meinshausen & Buhlmann, 2006)
(Liu, Lafferty & Wasserman, 2009)
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 7 / 27
j∈S f ∗ j
j∈S f ∗ j
|S|≤s
f=
j∈S
fj fj∈Hj
n
n
|S|≤s
f=
j∈S
fj fj∈Hj
n
n
p
|S|≤s
f=
j∈S
fj fj∈Hj
n
n
p
p
L2(Pn) := 1 n
i=1 f 2 j (xij).
j∈S f ∗ j .
j∈S f ∗ j .
f=p
j=1 fj
n
p
j∈S f ∗ j .
f=p
j=1 fj
n
p
p
p
n
j (xij),
p
n
−0.5 −0.25 0.25 0.5 −1.5 −1 −0.5 0.5 Design value x Function value 2nd order polynomial kernel
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 10 / 27
n
−0.25 0.25 0.5 −1.5 −1 −0.5 0.5 Design value x Function value 1st order Sobolev
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 27
(Kimeldorf & Wahba, 1971)
f=p
j=1 fj
n
p
p
p
(UC Berkeley) High-dimensional statistics February, 2013 12 / 27
(Kimeldorf & Wahba, 1971)
f=p
j=1 fj
n
p
p
p
A∈Rn×p
p
2 + ρn p
j Kjαj + µn p
j K2 j αj
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 12 / 27
100 200 300 400 500 600 700 800 900 0.005 0.01 0.015 0.02 0.025 0.03 Sample size Mean−squared error MSE versus raw sample size p = 256 p = 128 p = 64
4 5 6 7 8 9 10 0.005 0.01 0.015 0.02 0.025 0.03 Rescaled sample size Mean−squared error MSE versus rescaled sample size p = 256 p = 128 p = 64
∞
∞
(Mendelson, 2002)
∞
(Mendelson, 2002)
∞
(Mendelson, 2002)
j in same univariate Hilbert space H with
j in same univariate Hilbert space H with
L2(Pn)
n
L2(Pn)
L2(Pn)
◮ α = 2: Lipschitz functions ◮ α = 4: twice differentiable functions
◮ α = 2: Lipschitz functions ◮ α = 4: twice differentiable functions
L2(Pn)
2α 2α+1
◮ α = 2: Lipschitz functions ◮ α = 4: twice differentiable functions
L2(Pn)
2α 2α+1
◮ linear function classes in Rm ◮ polynomials of degree d = m − 1 in R
◮ linear function classes in Rm ◮ polynomials of degree d = m − 1 in R
L2(Pn)
◮ linear function classes in Rm ◮ polynomials of degree d = m − 1 in R
L2(Pn)
◮ “back-fitting” method for sparse additive models ◮ establish consistency but do not track sparsity s Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 20 / 27
◮ “back-fitting” method for sparse additive models ◮ establish consistency but do not track sparsity s
◮ regularize with fn,1: ◮ establish a rate of the order s
n
2α+1 for α-smooth Sobolev spaces Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 20 / 27
◮ “back-fitting” method for sparse additive models ◮ establish consistency but do not track sparsity s
◮ regularize with fn,1: ◮ establish a rate of the order s
n
2α+1 for α-smooth Sobolev spaces
◮ regularize with fH,1 ◮ establish rates involving terms at least s3 log p
n
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 20 / 27
◮ “back-fitting” method for sparse additive models ◮ establish consistency but do not track sparsity s
◮ regularize with fn,1: ◮ establish a rate of the order s
n
2α+1 for α-smooth Sobolev spaces
◮ regularize with fH,1 ◮ establish rates involving terms at least s3 log p
n
◮ analyze same estimator but under a global boundedness condition ◮ rates are not minimax-optimal Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 20 / 27
j ∞ =
j ∞ ≤ B.
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 21 / 27
j ∞ =
j ∞ ≤ B.
L2(Pn) φ(s, n)
2α 2α+1 + s log(p/s)
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 21 / 27
f ∗∈Fs,p,α
2.
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 22 / 27
1 Nature chooses a function f ∗ from a class F. 2 User makes n observations of f ∗ from a noisy channel. 3 Function estimation as decoding: return estimate
i=1.
(Hasminskii, 1978, Birge, 1981, Yang & Barron, 1999)
(Kolmogorov & Tikhomirov, 1960)
(Kolmogorov & Tikhomirov, 1960)
(Kolmogorov & Tikhomirov, 1960)
1 Logarithmic metric entropy
◮ parametric classes ◮ finite-rank kernels ◮ any function class with finite VC dimension
1 Polynomial metric entropy:
α
◮ various smoothness classes ◮ Sobolev classes
1 For function class F with m-logarithmic metric entropy:
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 26 / 27
1 For function class F with m-logarithmic metric entropy:
2 For function class F with α-polynomial metric entropy:
2α+1
Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 26 / 27
◮ convex relaxation based on a composite regularizer ◮ attains minimax-optimal rates for kernel classes: ⋆ cost of subset selection: s log p/s
n
⋆ cost of s-variate function estimation: sδ2
n
◮ convex relaxation based on a composite regularizer ◮ attains minimax-optimal rates for kernel classes: ⋆ cost of subset selection: s log p/s
n
⋆ cost of s-variate function estimation: sδ2
n
◮ functional ANOVA decompositions: allowing groupings of variables
◮ other types of high-dimensional non-parametric models: ⋆ kernel PCA/CCA with structural constraints ⋆ structured density estimation ◮ trade-offs between computational and statistical efficiency
◮ convex relaxation based on a composite regularizer ◮ attains minimax-optimal rates for kernel classes: ⋆ cost of subset selection: s log p/s
n
⋆ cost of s-variate function estimation: sδ2
n
◮ functional ANOVA decompositions: allowing groupings of variables
◮ other types of high-dimensional non-parametric models: ⋆ kernel PCA/CCA with structural constraints ⋆ structured density estimation ◮ trade-offs between computational and statistical efficiency