Ri Risk bo bounds unds for r some me classificati tion n and nd re regre ression models that interpolate
Daniel Hsu Columbia University Joint work with: Misha Belkin (The Ohio State University) Partha Mitra (Cold Spring Harbor Laboratory)
Ri Risk bo bounds unds for r some me classificati tion n and - - PowerPoint PPT Presentation
Ri Risk bo bounds unds for r some me classificati tion n and nd re regre ression models that interpolate Daniel Hsu Columbia University Joint work with: Misha Belkin (The Ohio State University) Partha Mitra (Cold Spring Harbor
Daniel Hsu Columbia University Joint work with: Misha Belkin (The Ohio State University) Partha Mitra (Cold Spring Harbor Laboratory)
2
(Breiman, 1995)
predicts accurately on new example
3
4
time and large enough network.
has substantial amount of label noise.
(Zhang, Bengio, Hardt, Recht, & Vinyals, 2017)
5
time and rich enough feature space.
has substantial amount of label noise.
(Belkin, Ma, & Mandal, 2018)
6
MNIST
" that perfectly fits noisy training data.
" is likely a very complex function!
" is non-trivial: e.g., noise rate + 5%.
7
Generalization: true error rate ≤ training error rate + deviation bound
label Ω(%) training examples.
8
are robust to label noise.
in other parts of data space.
(Wyner, Olson, Bleich, & Mease, 2017)
9
Nearest neighbor (Cover & Hart, 1967)
training example
Hilbert kernel (Devroye, Györfi, & Krzyżak, 1998)
smoothing kernel regression
! " − "$ = 1 " − "$ '
10
Show interpolation methods can be consistent (or almost consistent) for classification & regression problems
11
Analyses of two new interpolation schemes
12
13
7(!) on each simplex by affine interpolation of vertices' labels.
= be plug-in classifier based on ̂ 7.
14
!" !# !$
̂ . ! = 0
12" '("
+1)1
!
15
Key idea: aggregates information from all vertices to make prediction. (C.f. nearest neighbor rule.)
x1 x3 x2 1
Nearest neighbor rule
x1 x3 x2 1
Simplicial interpolation 3 4 " = 1 here
Effect even more pronounced in high dimensions!
16
Theorem: Assume distribution of ! is uniform on some convex set, " is Holder smooth. Then simplicial interpolation estimate satisfies limsup
)
* ̂ " ! − " !
2 0 + 1 3- and plug-in classifier satisfies limsup
)
Pr 6 7 ! ≠ 7
9:; !
≤ < 1
17
= >
18
training data, and let )(#), … , ) ' be corresponding labels.
19
!(#) !(*) !(') !
Define ̂ , ! = ∑/0#
'
1(!, ! / ) ) / ∑/0#
'
1(!, ! / ) where 1 !, ! / = ! − ! /
34,
5 > 0
Weighted k-NN Hilbert kernel (Devroye, Györfi, & Krzyżak, 1998) ̂ " # = ∑&'(
)
*(#, # & ) . & ∑&'(
)
*(#, # & ) *(#, # & ) = ‖# − # & ‖12 Our analysis needs 0 < 5 < 6/2 ̂ " # = ∑&'(
9
*(#, #&) .& ∑&'(
9
*(#, #&) * #, #& = # − #&
12
MUST have 5 = 6 for consistency
20
Localization makes it possible to prove non-asymptotic rate.
Theorem: Assume distribution of ! is uniform on some compact set satisfying regularity condition, and " is #-Holder smooth. For appropriate setting of $, weighted k-NN estimate satisfies % ̂ " ' − " '
) ≤ + ,-)./().12)
If Tsybakov noise condition with parameter 4 > 0 also holds, then plug-in classifier, with appropriate setting of $, satisfies Pr 9 : ! ≠ :
<=> !
≤ + ,-.?/(. )1? 12)
21
22
interpolation (Cutler & Zhao, 2001)
k-NN in terms of performance and noise-robustness
(Drory, Avidan, & Giryes, 2018; Cohen, Sapiro, & Giryes, 2018)
23
noisily-labeled training examples is small in high-dimensions. But also a great source of adversarial examples -- easy to find using local
examples.
24
min$∈ℋ ' ℋ
( subject to ' )* = ,* for all - = 1, … , 1.
When does this work (with noisy labels)?
some regimes.
25
26
arxiv.org/abs/1806.05161