SLIDE 1
When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and - - PowerPoint PPT Presentation
When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and - - PowerPoint PPT Presentation
When does the Tukey Median work? Banghua Zhu with Jiantao Jiao and Jacob Steinhardt Department of EECS and Statistics University of California, Berkeley ISIT 2020 June 21, 2020 Robust mean estimation - mean and median in 1d Mean estimation
SLIDE 2
SLIDE 3
Robust mean estimation - mean and median in 1d
Mean estimation in the presence of additive corruption (outlier) (Huber, 1973)
SLIDE 4
Median in high dimension?
Tukey depth: DTukey(µ,p) = inf
v∈Rd p(v⊤(X − µ) ≥ 0).
Tukey median (Tukey, 1975): the point(s) with largest Tukey depth: T(p) = argmax
µ∈Rd
DTukey(µ,p).
SLIDE 5
Preliminaries - corruption model
Two corruption models: Total Variation (TV) corruption stronger than additive corruption:
SLIDE 6
Preliminaries - assumption on the true distribution p∗
Halfspace symmetric distributions (Zuo and Serfling, 2000; Chen, Tyler, et al., 2002): exists a point µ ∈ Rd such that for X ∼ p∗,
∀v ∈ Rd,v⊤(X − µ)
d
= −v⊤(X − µ)
Example: Gaussian
SLIDE 7
Preliminaries - performance metric
Maximum bias for Tukey median: the maximum distance between T(p) and T(p∗), where p is in the set of all possible level-ε corruptions badd(p∗,ε) =
sup
p∈Cadd(p∗,ε),x∈T(p),y∈T(p∗)
x − y,
bTV(p∗,ε) =
sup
TV(p∗,p)≤ε,x∈T(p),y∈T(p∗)
x − y.
SLIDE 8
Preliminaries - performance metric
Breakdown point: the minimum corruption level that can drive the maximum bias to infinity:
ε∗
add(p∗) = inf{ε | b(p∗,ε) = ∞},
ε∗
TV(p∗) = inf{ε | b(p∗,ε) = ∞}.
Breakdown point for a family of distribution G:
ε∗
add(G) = inf q∈G ε∗ add(q),
ε∗
TV(G) = inf q∈G ε∗ TV(q).
SLIDE 9
Previous Results
Breakdown point under additive corruption (Donoho, 1982; Donoho and Gasko, 1992):
1 2 3 4 5 6 7
dimension
0.1 0.2 0.3 0.4 0.5 0.6
breakdown point
Tukey+additive+symmetric Tukey+additive+general
1/3 1/(d+1)
SLIDE 10
Our contribution
Breakdown point under TV corruption:
1 2 3 4 5 6 7
dimension
0.1 0.2 0.3 0.4 0.5 0.6
breakdown point
projection+TV+symmetric Tukey+TV+symmetric Tukey+additive+symmetric Tukey+additive+general
1/2 1/3 1/4 1/(d+1)
Characterization of maximum bias in population and finite-sample case: both algorithms can achieve near optimal maximum bias Θ(ε) under TV corruption when ε < 0.249 for Gaussian distribution.
SLIDE 11
Main results - Breakdown point
Theorem (Breakdown point for Tukey median (Zhu, Jiao, and Steinhardt, 2020, Theorem 1))
Denote G as the set of all halfspace-symmetric distributions. Then the breakdown point for G is
ε∗
add(G) =
- 1/2,
d = 1 1/3, d ≥ 2 ,
ε∗
TV(G) =
1/2, d = 1 1/3, d = 2 1/4, d ≥ 3 Proof of upper bound via figures:
SLIDE 12
Main results - Maxbias
Theorem (Maximum bias under finite-sample TV corruption model (Zhu, Jiao, and Steinhardt, 2020, Theorem 3))
Assume p∗ is halfspace-symmetric centered at µ∗ with decay function h(t) = supv∈Rd,v∗≤1 p∗(v⊤(X − µ∗) > t). Denote ˆ pn as the empirical distribution taken from ε-TV corrupted distribution p. When d ≥ 3, with probability at least 1−δ, there exists universal constant C > 0 such that for any ˆ
µ ∈ T(ˆ
pn),
ˆ µ − µ∗ ≤ h−1 (1− h(0)− 2˜ ε)
(1) when 2˜
ε < 1− h(0), ˜ ε = ε + C ·
- d+1+log(1/δ)
n
, h−1 is the generalized inverse function of h. As n → ∞, recover the result in population. Can generalize to other cases. Since h(0) ≤ 1/2, implies 1/4 lower bound on the breakdown point. For Gaussian p∗, h(t) = 1/2−Θ(t) for t small, achieve maxibias O(ε) when n = Ω(d/ε2).
SLIDE 13
Main results - Maxbias (proof sketch)
Proof sketch of population case:
Lemma (Zhu, Jiao, and Steinhardt (2020, Lemma 1))
If DTukey(T(p),p∗) ≥ α, we have
T(p)− µ∗ ≤ h−1(α).
(2) For TV corruption model, we have DTukey(T(p),p∗) ≥ DTukey(T(p),p)−ε ≥ DTukey(µ∗,p)−ε
≥ DTukey(µ∗,p∗)− 2ε = 1− h(0)− 2ε.
For finite-sample case, it suffices to lower bound DTukey(ˆ
µ,p∗), ˆ µ ∈ T(ˆ
pn) using standard concentration argument.
SLIDE 14
Main results - Projection algorithm
Consider the halfspace metric defined in Donoho and Liu (1988) as
- TV(p,q) =
sup
v∈Rd,t∈R
|p(v⊤X ≥ t)− q(v⊤X ≥ t)|.
(3) Let G(h) be the set of half-space symmetric distributions:
G(h) = {p |∃µ ∈ RdX ∼ p is halfspace-symmetric around µ and sup
v∈Rd,v∗≤1
p(v⊤(X − µ) > t) ≤ h(t)}. (4) The projection algorithm outputs ˆ
µ(p) = T(q):
p∗ ∈ G
G
q ∈ G
p r
- j
e c t i
- n
u n d e r T V
corrupted distribution ˆ pn
- TV ε
SLIDE 15
Main results - Projection algorithm
Theorem (Maximum bias and breakdown point for projection algorithm (Zhu, Jiao, and Steinhardt, 2020, Theorem 3))
Assume the true distribution p∗ is halfspace-symmetric centered at µ∗ with decay function h(t) = supv∈Rd,v∗≤1 p∗(v⊤(X − µ∗) > t). Then for any p with TV(p∗,p) ≤ ε, the projection estimator ˆ
µ(p) satisfies ˆ µ − µ∗ ≤ 2h−1 (1/2−ε)
(5) when ε < 1/2. Here h−1 is the generalized inverse function of h. Improve the breakdown point from 1/4 for Tukey median in high dimension under TV corruption to 1/2, optimal among all translation-equivariant estimators (Rousseeuw and Leroy, 2005, Equation 1.38). Can be extended to finite-sample case using similar argument. Achieve O(ε) maximum bias for Gaussians.
SLIDE 16
Main results - Projection algorithm
Intuition on improving the breakdown point:
SLIDE 17
Conclusion
Tukey median: affine-equivariant, breakdown point 1/4 under TV corruption in high dimensions, good finite sample error.
- TV projection algorithm: not affine-equivariant, breakdown point 1/2 and
good finite sample error. Open problem: find an estimator that is affine-equivariant, with breakdown point 1/2 and good finite sample error.
SLIDE 18
References I
Huber, P . J. (1973). Robust regression: Asymptotics, conjectures and monte
- carlo. The Annals of Statistics, 1(5), 799–821.
Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the international congress of mathematicians, vancouver, 1975. Donoho, D. L. (1982). Breakdown properties of multivariate location estimators (tech. rep.). Technical report, Harvard University, Boston. Donoho, D. L., & Liu, R. C. (1988). The “automatic” robustness of minimum distance functionals. The Annals of Statistics, 16(2), 552–586. Donoho, D. L., & Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics, 20(4), 1803–1827. Zuo, Y., & Serfling, R. (2000). General notions of statistical depth function. Annals of statistics, 461–482. Chen, Z., Tyler, D. E. Et al. (2002). The influence function and maximum bias
- f tukey’s median. The Annals of Statistics, 30(6), 1737–1759.
Rousseeuw, P . J., & Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). John wiley & sons.
SLIDE 19