Boosting, Min-Norm Interpolated Classifiers, and Overparametrization: - - PowerPoint PPT Presentation

boosting min norm interpolated classifiers and
SMART_READER_LITE
LIVE PREVIEW

Boosting, Min-Norm Interpolated Classifiers, and Overparametrization: - - PowerPoint PPT Presentation

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch Boosting, Min-Norm Interpolated Classifiers, and Overparametrization: a precise asymptotic theory Tengyuan Liang joint work with Pragya Sur


slide-1
SLIDE 1

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

Boosting, Min-Norm Interpolated Classifiers, and Overparametrization: a precise asymptotic theory Tengyuan Liang joint work with Pragya Sur (Harvard)

1 / 35

slide-2
SLIDE 2

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

OUTLINE

  • Motivation: min-norm interpolants under overparametrized regime
  • Classification: boosting on separable data
  • precise asymptotics of margin
  • fixed point of a non-linear system of equations
  • statistical and algorithmic implications
  • Proof Sketch: Gaussian comparison and convex geometry tools

2 / 35

slide-3
SLIDE 3

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

OVERPARAMETRIZED REGIME OF STAT/ML

Model class complex enough to interpolate the training data.

Zhang, Bengio, Hardt, Recht, and Vinyals (2016) Belkin et al. (2018); Liang and Rakhlin (2018); Bartlett et al. (2019); Hastie et al. (2019)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 lambda 10 1 log(error)

Kernel Regression on MNIST

digits pair [i,j] [2,5] [2,9] [3,6] [3,8] [4,7]

λ = 0: the interpolants on training data.

MNIST data from LeCun et al. (2010)

3 / 35

slide-4
SLIDE 4

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

OVERPARAMETRIZED REGIME OF STAT/ML

Model class complex enough to interpolate the training data.

Zhang, Bengio, Hardt, Recht, and Vinyals (2016) Belkin et al. (2018); Liang and Rakhlin (2018); Bartlett et al. (2019); Hastie et al. (2019)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 lambda 10 1 log(error)

Kernel Regression on MNIST

digits pair [i,j] [2,5] [2,6] [2,7] [2,8] [2,9] [3,5] [3,6] [3,7] [3,8] [3,9] [4,5] [4,6] [4,7] [4,8] [4,9]

λ = 0: the interpolants on training data.

MNIST data from LeCun et al. (2010)

3 / 35

slide-5
SLIDE 5

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

OVERPARAMETRIZED REGIME OF STAT/ML

In fact, many models behave the same on training data. Practical methods or algorithms favor certain functions! Principle: among the models that interpolate, algorithms favor certain form of minimalism.

4 / 35

slide-6
SLIDE 6

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

OVERPARAMETRIZED REGIME OF STAT/ML

Principle: among the models that interpolate, algorithms favor certain form of minimalism.

  • overparametrized linear model and matrix factorization
  • kernel regression
  • support vector machines, Perceptron
  • boosting, AdaBoost
  • two-layer ReLU networks, deep neural networks (?)

4 / 35

slide-7
SLIDE 7

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

OVERPARAMETRIZED REGIME OF STAT/ML

Principle: among the models that interpolate, algorithms favor certain form of minimalism.

  • overparametrized linear model and matrix factorization
  • kernel regression
  • support vector machines, Perceptron
  • boosting, AdaBoost
  • two-layer ReLU networks, deep neural networks (?)

minimalism typically measured in form of certain norm motivates the study of min-norm interpolants

4 / 35

slide-8
SLIDE 8

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

MIN-NORM INTERPOLANTS

minimalism typically measured in form of certain norm motivates the study of min-norm interpolants Regression ̂ f = arg min

f

∥f∥norm, s.t. yi = f(xi) ∀i ∈ [n]. Classification ̂ f = arg min

f

∥f∥norm, s.t. yi ⋅ f(xi) ≥ 1 ∀i ∈ [n].

5 / 35

slide-9
SLIDE 9

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

Precise High-Dimensional Asymptotic Theory for Boosting and Min-L1-Norm Interpolated Classifiers

tyliang.github.io/Tengyuan.Liang/pdf/Liang-Sur-20.pdf

Classification ̂ f = arg min

f

∥f∥norm, s.t. yi ⋅ f(xi) ≥ 1 ∀i ∈ [n].

6 / 35

slide-10
SLIDE 10

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

PROBLEM FORMULATION

Given n-i.i.d. data pairs {(xi, yi)}1≤i≤n, with (x, y) ∼ P yi ∈ {±1} binary labels, xi ∈ Rp feature vector (weak learners) Consider when data is linearly separable P (∃θ ∈ Rp, yix⊺

i θ > 0 for 1 ≤ i ≤ n) → 1 .

Natural to consider overparametrized regime p/n → ψ ∈ (0, ∞) .

7 / 35

slide-11
SLIDE 11

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BOOSTING/ADABOOST

Initialize θ0 = 0 ∈ Rp, set data weights η0 = (1/n, ⋯, 1/n) ∈ ∆n. At time t ≥ 0:

  • 1. Learner/Feature Selection: j⋆

t ∶= arg maxj∈[p] ∣η⊺ t Zej∣, set γt = η⊺ t Zej⋆ t

;

  • 2. Adaptive Stepsize: αt = 1

2 log ( 1+γt 1−γt ) ;

  • 3. Coordinate Update: θt+1 = θt + αt ⋅ ej⋆

t

;

  • 4. Weight Update: ηt+1[i] ∝ ηt[i] exp(−αtyix⊺

i ej⋆ t

), normalized ηt+1 ∈ ∆n. Terminate after T steps, and output the vector θT. Freund and Schapire (1995, 1996)

8 / 35

slide-12
SLIDE 12

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BOOSTING/ADABOOST “... mystery of AdaBoost as the most important unsolved problem in Machine Learn- ing” Wald Lecture, Breiman (2004)

8 / 35

slide-13
SLIDE 13

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

KEY: EMPIRICAL MARGIN

Empirical margin is key to Generalization and Optimization. Generalization: for all f(x) = x⊺θ/∥θ∥1 and κ > 0, P(yf(x) < 0) ≤ 1 n

n

i=1

■(yif(xi) < κ) ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

empirical margin

+ √ log n log p nκ2 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

generalization error

+ √ log(1/δ) n , w.p. 1 − δ

Schapire, Freund, Bartlett, and Lee (1998)

Choose classifier f that maximizes minimal margin κ κ = max

θ∈Rp min 1≤i≤n yix⊺ i θ/∥θ∥1

generalization error < 1 √nκ ⋅ (log factors, constants)

9 / 35

slide-14
SLIDE 14

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

KEY: EMPIRICAL MARGIN

Empirical margin is key to Generalization and Optimization. Generalization: for all f(x) = x⊺θ/∥θ∥1 and κ > 0, P(yf(x) < 0) ≤ 1 n

n

i=1

■(yif(xi) < κ) ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

empirical margin

+ √ log n log p nκ2 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

generalization error

+ √ log(1/δ) n , w.p. 1 − δ

Schapire, Freund, Bartlett, and Lee (1998)

“An important open problem is to derive more careful and precise bounds which can be used for this purpose. Besides paying closer attention to constant factors, such an analysis might also involve the measurement of more sophisticated statistics.”

Schapire, Freund, Bartlett, and Lee (1998)

9 / 35

slide-15
SLIDE 15

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

KEY: EMPIRICAL MARGIN

Empirical margin is key to Generalization and Optimization. Optimization: for AdaBoost, p-weak learners, Z ∶= y ○ X ∈ Rn×p

n

i=1

■(−yix⊺

i θT > 0) ≤ ne ⋅ exp ( − T

t=1

γ2

t

2 (1 + o(γt))) . By Minimax Thm. ∣γt∣ = ∥Z⊺ηt∥∞ ≥ min

η∈∆n

∥Z⊺η∥∞ = min

η∈∆n

max

∥θ∥1≤1 η⊺Zθ = max ∥θ∥1≤1 min 1≤i≤n e⊺ i Zθ ≥ κ Freund and Schapire (1995); Zhang and Yu (2005)

Stopping time (zero-training error)

  • ptimization steps < 1

κ2 ⋅ (log factors, constants)

10 / 35

slide-16
SLIDE 16

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

L1 GEOMETRY, MARGIN, AND INTERPOLATION We consider min-L1-norm interpolated classifier on separable data ˆ θℓ1 = arg min

θ

∥θ∥1, s.t. yix⊺

i θ ≥ 1, ∀i ∈ [n] .

Algorithmic: on separable data, Boosting algorithm θT,s

boost with infinitesimal step-

size s agrees with the min-L1-norm interpolation asymptotically lim

s→0 lim T→∞ θT,s boost/∥θT,s boost∥1 = ˆ

θℓ1 .

Freund and Schapire (1995); Rosset et al. (2004); Zhang and Yu (2005)

11 / 35

slide-17
SLIDE 17

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

L1 GEOMETRY, MARGIN, AND INTERPOLATION min-L1-norm interpolation equiv. max-L1-margin max

∥θ∥1≤1 min 1≤i≤n yix⊺ i θ =∶ κℓ1(X, y) .

Prior understanding: generalization error < 1 √nκ ⋅ (log factors, constants)

  • ptimization steps < 1

κ2 ⋅ (log factors, constants)

12 / 35

slide-18
SLIDE 18

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

L1 GEOMETRY, MARGIN, AND INTERPOLATION Prior understanding: generalization error < 1 √nκ ⋅ (log factors, constants)

  • ptimization steps < 1

κ2 ⋅ (log factors, constants) However, many questions remain: Statistical

  • how large is the L1-margin κℓ1(X, y)?
  • angle between the interpolated clasifier ˆ

θ and the truth θ⋆?

  • precise generalization error of Boosting? relation to Bayes Error?

Computational

  • effect of increasing overparametrization ψ = p/n on optimization?
  • proportion of weak-learners activated by Boosting with zero initialization?

12 / 35

slide-19
SLIDE 19

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

DATA GENERATING PROCESS

  • DGP. xi ∼ N (0, Λ) i.i.d. with diagonal cov. Λ ∈ Rp×p, and yi are generated with some

f ∶ R → [0, 1], P(yi = +1∣xi) = 1 − P(yi = −1∣xi) = f(x⊺

i θ⋆) ,

with some θ⋆ ∈ Rp. Consider high-dim asymptotic regime with overparametrized ratio p/n → ψ ∈ (0, ∞), n, p → ∞.

13 / 35

slide-20
SLIDE 20

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

DATA GENERATING PROCESS

  • DGP. xi ∼ N (0, Λ) i.i.d. with diagonal cov. Λ ∈ Rp×p, and yi are generated with some

f ∶ R → [0, 1], P(yi = +1∣xi) = 1 − P(yi = −1∣xi) = f(x⊺

i θ⋆) ,

with some θ⋆ ∈ Rp. Consider high-dim asymptotic regime with overparametrized ratio p/n → ψ ∈ (0, ∞), n, p → ∞. signal strength ∶ ∥Λ1/2θ⋆∥ → ρ ∈ (0, ∞), coordinate ∶ ¯ wj = √p λ1/2

j

θ⋆,j ρ , 1 ≤ j ≤ p. Assume 1 p

p

j=1

δ(λj,¯

wj) Wasserstein-2

⇒ µ, a dist. on R>0 × R

13 / 35

slide-21
SLIDE 21

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

PRECISE HIGH-DIM ASYMPTOTIC THEORY FOR BOOSTING

For ψ ≥ ψ⋆ (separability threshold), sharp asymptotic characterization holds: Margin: lim

n,p→∞ p/n→ψ

p1/2 ⋅ κℓ1(X, y) = κ⋆(ψ, µ) , a.s. Generalization error: lim

n,p→∞ p/n→ψ

Px,y (y ⋅ x⊺ ˆ θℓ1 < 0) = Err⋆(ψ, µ) , a.s. Theorem (L. & Sur, ’20).

14 / 35

slide-22
SLIDE 22

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

PRECISE HIGH-DIM ASYMPTOTIC THEORY FOR BOOSTING

For ψ ≥ ψ⋆ (separability threshold), sharp asymptotic characterization holds: Margin: lim

n,p→∞ p/n→ψ

p1/2 ⋅ κℓ1(X, y) = κ⋆(ψ, µ) , a.s. Generalization error: lim

n,p→∞ p/n→ψ

Px,y (y ⋅ x⊺ ˆ θℓ1 < 0) = Err⋆(ψ, µ) , a.s. Theorem (L. & Sur, ’20). precise asymptotics can also be established on Angle: ⟨ˆ θℓ1, θ⋆⟩Λ ∥ˆ θℓ1∥Λ∥θ⋆∥Λ , Loss: ∑

j∈[p]

ℓ(ˆ θℓ1,j, θ⋆,j)

14 / 35

slide-23
SLIDE 23

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

PRECISE HIGH-DIM ASYMPTOTIC THEORY FOR BOOSTING

For ψ ≥ ψ⋆ (separability threshold), sharp asymptotic characterization holds: Margin: lim

n,p→∞ p/n→ψ

p1/2 ⋅ κℓ1(X, y) = κ⋆(ψ, µ) , a.s. Generalization error: lim

n,p→∞ p/n→ψ

Px,y (y ⋅ x⊺ ˆ θℓ1 < 0) = Err⋆(ψ, µ) , a.s. Theorem (L. & Sur, ’20). precise asymptotics can also be established on Angle: ⟨ˆ θℓ1, θ⋆⟩Λ ∥ˆ θℓ1∥Λ∥θ⋆∥Λ , Loss: ∑

j∈[p]

ℓ(ˆ θℓ1,j, θ⋆,j)

Gaussian comparison: Gordon (1988); Thrampoulidis et al. (2014, 2015, 2018) L2-margin: Gardner (1988); Shcherbina and Tirozzi (2003); Deng et al. (2019); Montanari et al. (2019)

14 / 35

slide-24
SLIDE 24

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

THEORY VS. EMPIRICAL

x-axis, varying ψ overparametrization ratio

1 2 3 4 5 6 1 2 3 4 CGMT LP

Margin: p1/2 ⋅ κℓ1 (X, y) → κ⋆(ψ, µ)

1 2 3 4 5 6 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 CGMT LP

Generalization: Px,y (y ⋅ x⊺ ˆ θℓ1 < 0) → Err⋆(ψ, µ)

Blue: empirical (numerical solution via linear programming) vs. Red: theoretical (fixed point via non-linear equation system)

15 / 35

slide-25
SLIDE 25

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

THEORY VS. EMPIRICAL

x-axis, varying ψ overparametrization ratio

1 2 3 4 5 6 1 2 3 4 CGMT LP

Margin: p1/2 ⋅ κℓ1 (X, y) → κ⋆(ψ, µ)

1 2 3 4 5 6 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 CGMT LP

Generalization: Px,y (y ⋅ x⊺ ˆ θℓ1 < 0) → Err⋆(ψ, µ)

Blue: empirical (numerical solution via linear programming) vs. Red: theoretical (fixed point via non-linear equation system) Strikingly Accurate Asymptotics for Breiman’s Max Min-Margin! max∥θ∥1≤1 min1≤i≤n yix⊺

i θ

15 / 35

slide-26
SLIDE 26

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

NON-LINEAR EQUATION SYSTEM: FIXED POINT

[L. & Sur, ’20]: κ⋆(ψ, µ) enjoys the analytic characterization via fixed point c1(ψ, κ), c2(ψ, κ), s(ψ, κ)

define Fκ(⋅, ⋅) ∶ R × R≥0 → R≥0 Fκ(c1, c2) ∶= (E [(κ − c1YZ1 − c2Z2)2

+]) 1 2

where ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ Z2 ⊥ (Y, Z1) Zi ∼ N (0, 1), i = 1, 2 P(Y = +1∣Z1) = 1 − P(Y = −1∣Z1) = f(ρ ⋅ Z1) .

16 / 35

slide-27
SLIDE 27

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

NON-LINEAR EQUATION SYSTEM: FIXED POINT

[L. & Sur, ’20]: κ⋆(ψ, µ) enjoys the analytic characterization via fixed point c1(ψ, κ), c2(ψ, κ), s(ψ, κ)

Fixed point equations for c1, c2, s ∈ R × R>0 × R>0 given ψ > 0, where the expectation is over (Λ, W, G) ∼ µ ⊗ N (0, 1) =∶ Q c1 = − E

(Λ,W,G)∼Q

⎛ ⎜ ⎝ Λ−1/2W ⋅ proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠ c2

1 + c2 2 =

E

(Λ,W,G)∼Q

⎛ ⎜ ⎝ Λ−1/2 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠

2

. 1 = E

(Λ,W,G)∼Q

  • Λ−1 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

  • with proxλ(t) = arg min

s

{λ∣s∣ + 1 2 (s − t)2} = sgn(t) (∣t∣ − λ)+

16 / 35

slide-28
SLIDE 28

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

NON-LINEAR EQUATION SYSTEM: FIXED POINT

[L. & Sur, ’20]: κ⋆(ψ, µ) enjoys the analytic characterization via fixed point c1(ψ, κ), c2(ψ, κ), s(ψ, κ)

Fixed point equations for c1, c2, s ∈ R × R>0 × R>0 given ψ > 0, where the expectation is over (Λ, W, G) ∼ µ ⊗ N (0, 1) =∶ Q c1 = − E

(Λ,W,G)∼Q

⎛ ⎜ ⎝ Λ−1/2W ⋅ proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠ c2

1 + c2 2 =

E

(Λ,W,G)∼Q

⎛ ⎜ ⎝ Λ−1/2 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠

2

. 1 = E

(Λ,W,G)∼Q

  • Λ−1 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

  • T(ψ, κ) ∶= ψ−1/2 [Fκ(c1, c2) − c1∂1Fκ(c1, c2) − c2∂2Fκ(c1, c2)] − s

with c1(ψ, κ), c2(ψ, κ), s(ψ, κ).

κ⋆(ψ, µ) ∶= inf{κ ≥ 0 ∶ T(ψ, κ) ≥ 0}

16 / 35

slide-29
SLIDE 29

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

GENERALIZATION ERROR, BAYES ERROR, AND ANGLE With c⋆

i ∶= ci(ψ, κ⋆(ψ, µ)), i = 1, 2.

Err⋆(ψ, µ) = P (c⋆

1 YZ1 + c⋆ 2 Z2 < 0)

BayesErr(ψ, µ) = P (YZ1 < 0)

17 / 35

slide-30
SLIDE 30

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

GENERALIZATION ERROR, BAYES ERROR, AND ANGLE With c⋆

i ∶= ci(ψ, κ⋆(ψ, µ)), i = 1, 2.

Err⋆(ψ, µ) = P (c⋆

1 YZ1 + c⋆ 2 Z2 < 0)

BayesErr(ψ, µ) = P (YZ1 < 0) ⟨ˆ θℓ1, θ⋆⟩Λ ∥ˆ θℓ1∥Λ∥θ⋆∥Λ → c⋆

1

√ (c⋆

1 )2 + (c⋆ 2 )2 Mannor et al. (2002); Jiang (2004); Bartlett and Traskin (2007); Bartlett et al. (2004)

17 / 35

slide-31
SLIDE 31

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

Statistical and Algorithmic implications

18 / 35

slide-32
SLIDE 32

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BACK TO GENERALIZATION

Known generalization bounds: generalization error < 1 √nκℓ1(X, y) ⋅ (log factors, constants) = √ψ κ⋆(ψ, µ) ⋅ (log factors, constants)

1 2 3 4 5 6 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 CGMT LP

19 / 35

slide-33
SLIDE 33

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BACK TO GENERALIZATION

Known generalization bounds: generalization error < 1 √nκℓ1(X, y) ⋅ (log factors, constants) = √ψ κ⋆(ψ, µ) ⋅ (log factors, constants) Let’s plot generalization error and κ⋆(ψ, µ)/√ψ

1 2 3 4 5 6 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 CGMT LP

κ⋆(ψ, µ)/ √ ψ against ψ generalization error vs. known bounds L2-margin: Montanari et al. (2019)

19 / 35

slide-34
SLIDE 34

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BACK TO BOOSTING ALGORITHMS

Known computation results:

  • ptimization steps <

1 κ2

ℓ1(X, y) ⋅ (log factors, constants)

lim

s→0 lim T→∞

min

i∈[n]

yix⊺

i θT,s boost

∥θT,s

boost∥1

= κℓ1(X, y)

20 / 35

slide-35
SLIDE 35

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BACK TO BOOSTING ALGORITHMS

Known computation results:

  • ptimization steps <

1 κ2

ℓ1(X, y) ⋅ (log factors, constants)

lim

s→0 lim T→∞

min

i∈[n]

yix⊺

i θT,s boost

∥θT,s

boost∥1

= κℓ1(X, y) With proper (non-vanishing) stepsize s, the sequence {θt,s

boost}∞ t=0 satisfy:

for any 0 < ǫ < 1, with stopping time t ≥ Tǫ(p) with Tǫ(p) n log2 n → 12ǫ−2 (κ⋆(ψ, µ)/√ψ)2 , the solution approximates the Min-L1-Interpolated Classifier p1/2 ⋅ min

i∈[n]

yix⊺

i θt,s boost

∥θt,s

boost∥1

∈ [(1 − ǫ) ⋅ κ⋆(ψ, µ), κ⋆(ψ, µ)] . Theorem (L. & Sur, ’20).

20 / 35

slide-36
SLIDE 36

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

BACK TO BOOSTING ALGORITHMS

With proper (non-vanishing) stepsize s, the sequence {θt,s

boost}∞ t=0 satisfy:

for any 0 < ǫ < 1, with stopping time t ≥ Tǫ(p) with Tǫ(p) n log2 n → 12ǫ−2 (κ⋆(ψ, µ)/√ψ)2 , Theorem (L. & Sur, ’20).

1 2 3 4 5 6 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 CGMT LP

κ⋆(ψ, µ)/ √ ψ against ψ

  • verparametrization → faster optimization

20 / 35

slide-37
SLIDE 37

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

ALGORITHMIC: ACTIVATED FEATURES BY BOOSTING

Boosting chooses weak-learner (WL) adaptively. How sparse is Selected WL

Total WL ?

21 / 35

slide-38
SLIDE 38

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

ALGORITHMIC: ACTIVATED FEATURES BY BOOSTING

Boosting chooses weak-learner (WL) adaptively. How sparse is Selected WL

Total WL ?

Let S0(p) be the number of weak-learner selected when Boosting hits zero training error 1

n ∑n i=1 ■(yix⊺ i θt < 0) = 0 with initialization θ0 = 0,

S0(p) ∶= # {j ∈ [p] ∶ θt

j ≠ 0} .

We show that lim sup

n,p→∞

S0(p) p⋅ log2 n ≤ 12 κ2

⋆(ψ, µ) ∧ 1 .

Theorem (L. & Sur, ’20).

21 / 35

slide-39
SLIDE 39

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

ALGORITHMIC: ACTIVATED FEATURES BY BOOSTING

Boosting chooses weak-learner (WL) adaptively. How sparse is Selected WL

Total WL ?

Let S0(p) be the number of weak-learner selected when Boosting hits zero training error 1

n ∑n i=1 ■(yix⊺ i θt < 0) = 0 with initialization θ0 = 0,

S0(p) ∶= # {j ∈ [p] ∶ θt

j ≠ 0} .

We show that lim sup

n,p→∞

S0(p) p⋅ log2 n ≤ 12 κ2

⋆(ψ, µ) ∧ 1 .

Theorem (L. & Sur, ’20). In the numerical example: overparametrization ψ > 5,

12 κ2

⋆(ψ,µ) ≪ 1. 21 / 35

slide-40
SLIDE 40

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

Proof Sketch Gaussian Comparison + Convex Geometry + New Uniform Convergence

22 / 35

slide-41
SLIDE 41

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

TECHNICAL REMARKS

Our proof build upon Convex Gaussian Minimax Theorem Thrampoulidis et al. (2014, 2015,

2018); Gordon (1988) and is inspired by the work on the L2-margin by Montanari et al. (2019).

L1-case has technical difficulties to overcome

  • we prove a stronger uniform deviation result that suits the L1 case, by exploiting

a self-normalization property.

  • different fixed point equation systems.

(normalized) max L1 margin much larger than max L2 margin

23 / 35

slide-42
SLIDE 42

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

PROOF SKETCH

Step 1:

ξ(n,p)

ψ,κ ∶=

min

∥θ∥1≤√p

max

∥λ∥2≤1,λ≥0

1 √p λT(κ1 − (y ⊙ X)θ) It is not hard to see that ξ(n,p)

ψ,κ = 0, if and only if κ ≤ p1/2 ⋅ κℓ1 ({xi, yi}n i=1) ,

ξ(n,p)

ψ,κ > 0, if and only if κ > p1/2 ⋅ κℓ1 ({xi, yi}n i=1) . 24 / 35

slide-43
SLIDE 43

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

PROOF SKETCH

Step 1:

ξ(n,p)

ψ,κ ∶=

min

∥θ∥1≤√p

max

∥λ∥2≤1,λ≥0

1 √p λT(κ1 − (y ⊙ X)θ) ξ(n,p)

ψ,κ ∶=

min

∥θ∥1≤√p

max

∥λ∥2≤1,λ≥0

1 √p λT (κ1 − (y ⊙ z)⟨w, Λ1/2θ⟩) − 1 √p λTZΠw⊥ (Λ1/2θ)

Step 2: reduction via Gordon’s comparison (convex Gaussian min-max theorem)

Thrampoulidis et al. (2014, 2015); Gordon (1988) ˆ ξ(n,p)

ψ,κ

∶= min

∥θ∥1≤√p

max

∥λ∥2≤1,λ≥0

1 √p λT (κ1 − (y ⊙ z)⟨w, Λ1/2θ⟩ − ˜ z∥Πw⊥ (Λ1/2θ)∥2) + 1 √p ∥λ∥2⟨g, Πw⊥ (Λ1/2θ)⟩ = min

∥θ∥1≤√p

⎡ ⎢ ⎢ ⎢ ⎣ ψ−1/2̂ Fκ (⟨w, Λ1/2θ⟩, ∥Πw⊥ (Λ1/2θ)∥2) + 1 √p ⟨Πw⊥ (g), Λ1/2θ⟩ ⎤ ⎥ ⎥ ⎥ ⎦

24 / 35

slide-44
SLIDE 44

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

GORDON’S STATEMENT OF SLEPIAN-FERNIQUE-SUDAKOV

Let {Xij} and {Yij}, 1 ≤ i ≤ n, 1 ≤ j ≤ m, be two centered Gaussian processes which satisfy for all indices: (i) EX2

ij = EY2 ij,

(ii) E(XijXik) ≥ E(YijYik), (iii) E(XijXℓk) ≤ E(YijYℓk), if i ≠ ℓ. Then E min

i

max

j

Xij ≤ E min

i

max

j

Yij . Gordon (1988)

25 / 35

slide-45
SLIDE 45

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

[BACKUP] CONVEX GAUSSIAN MINMAX THEOREM

Let Ω1 ⊂ Rn, Ω2 ⊂ Rp be two compact sets and let U ∶ Ω1 × Ω2 → R be a continuous function. Let Z = (Zi,j) ∈ Rn×p, g ∼ N (0, In) and h ∼ N (0, Ip) be independent vectors and matrices with standard Gaussian entries. Define V1(Z) = min

w1∈Ω1

max

w2∈Ω2

w⊺

1 Zw2 + U(w1, w2) ,

V2(g, h) = min

w1∈Ω1

max

w2∈Ω2

∥w2∥g⊺w1 + ∥w1∥h⊺w2 + U(w1, w2) . Then

  • 1. For all t ∈ R,

P(V1(Z) ≤ t) ≤ 2P(V2(g, h) ≤ t) .

  • 2. Suppose Ω1 and Ω2 are both convex, and U is convex concave in (w1, w2). Then, for all t ∈ R,

P(V1(Z) ≥ t) ≤ 2P(V2(g, h) ≥ t) . Thrampoulidis et al. (2014, 2015); Gordon (1988)

26 / 35

slide-46
SLIDE 46

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

TECHNICAL CHALLENGES IN L1 CASE

Step 3: large n, p limit

The empirical problem (finite-dim optimization) ˆ ξ(n,p)

ψ,κ =

min

∥θ∥1≤√p

⎡ ⎢ ⎢ ⎢ ⎣ ψ−1/2̂ Fκ (⟨w, Λ1/2θ⟩, ∥Πw⊥ (Λ1/2θ)∥2) + 1 √p ⟨Πw⊥ (g), Λ1/2θ⟩ ⎤ ⎥ ⎥ ⎥ ⎦ Let’s naively take the limit (infinite-dim optimization) ˜ ξ(∞,∞)

ψ,κ

∶= min

∥h∥L1(Q)≤1 [ψ−1/2Fκ (⟨w, Λ1/2h⟩L2(Q), ∥Πw⊥ (Λ1/2h)∥L2(Q)) + ⟨Πw⊥ (G), Λ1/2h⟩L2(Q)]

One needs to show lim

n,p→∞ p/n→ψ

ˆ ξ(n,p)

ψ,κ a.s.

= ˜ ξ(∞,∞)

ψ,κ

“the a.s. limit”

27 / 35

slide-47
SLIDE 47

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

TECHNICAL CHALLENGES IN L1 CASE

Step 3: large n, p limit

The empirical problem (finite-dim optimization) ˆ ξ(n,p)

ψ,κ =

min

∥θ∥1≤√p

⎡ ⎢ ⎢ ⎢ ⎣ ψ−1/2̂ Fκ (⟨w, Λ1/2θ⟩, ∥Πw⊥ (Λ1/2θ)∥2) + 1 √p ⟨Πw⊥ (g), Λ1/2θ⟩ ⎤ ⎥ ⎥ ⎥ ⎦ Let’s naively take the limit (infinite-dim optimization) ˜ ξ(∞,∞)

ψ,κ

∶= min

∥h∥L1(Q)≤1 [ψ−1/2Fκ (⟨w, Λ1/2h⟩L2(Q), ∥Πw⊥ (Λ1/2h)∥L2(Q)) + ⟨Πw⊥ (G), Λ1/2h⟩L2(Q)]

One needs to show lim

n,p→∞ p/n→ψ

ˆ ξ(n,p)

ψ,κ a.s.

= ˜ ξ(∞,∞)

ψ,κ

“the a.s. limit”

27 / 35

slide-48
SLIDE 48

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

TECHNICAL CHALLENGES IN L1 CASE

Step 3: large n, p limit

The empirical problem (finite-dim optimization) ˆ ξ(n,p)

ψ,κ =

min

∥θ∥1≤√p

⎡ ⎢ ⎢ ⎢ ⎣ ψ−1/2̂ Fκ (⟨w, Λ1/2θ⟩, ∥Πw⊥ (Λ1/2θ)∥2) + 1 √p ⟨Πw⊥ (g), Λ1/2θ⟩ ⎤ ⎥ ⎥ ⎥ ⎦ Let’s naively take the limit (infinite-dim optimization) ˜ ξ(∞,∞)

ψ,κ

∶= min

∥h∥L1(Q)≤1 [ψ−1/2Fκ (⟨w, Λ1/2h⟩L2(Q), ∥Πw⊥ (Λ1/2h)∥L2(Q)) + ⟨Πw⊥ (G), Λ1/2h⟩L2(Q)]

One needs to show lim

n,p→∞ p/n→ψ

ˆ ξ(n,p)

ψ,κ a.s.

= ˜ ξ(∞,∞)

ψ,κ

“the a.s. limit”

L1 vs. L2 geometry: for the constraint set ∥θ∥1 ≤ √p, define c1 = ⟨w, Λ1/2θ⟩, c2 = ∥Πw⊥(Λ1/2θ)∥2 c2 could be √p → ∞.

27 / 35

slide-49
SLIDE 49

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

KKT TO SYSTEM OF EQUATIONS To prove “the a.s. limit”, start with the KKT condition

Λ1/2ΠW⊥ (G) + ψ−1/2Λ1/2 [∂1Fκ(c1, c2)W + ∂2Fκ(c1, c2)ΠW⊥ (Z)] + s ⋅ ∂∥h∥L1(Q∞) = 0 , s(1 − ∥h∥L1(Q∞)) = 0 , s ≥ 0, ∥h∥L1(Q∞) ≤ 1 .

which implies

h⋆ = − Λ−1 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

.

plugging in the system c1 = ⟨Λ1/2h⋆, W⟩L2(Q∞), c2

1 + c2 2 = ∥Λ1/2h⋆∥2 L2(Q∞),

∥h⋆∥L1(Q∞) = 1

28 / 35

slide-50
SLIDE 50

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

UNIFORM DEVIATION ON FIXED POINT EQUATIONS V(∞,∞)

1

(c1, c2, s) ∶= c1 + E

(Λ,W,G)∼Q∞

⎛ ⎜ ⎝ Λ−1/2W ⋅ proxs (Λ1/2ΠW⊥ (G) + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠ V(∞,∞)

2

(c1, c2, s) ∶= c2

1 + c2 2 −

E

(Λ,W,G)∼Q∞

⎛ ⎜ ⎝ Λ−1/2 proxs (Λ1/2ΠW⊥ (G) + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠

2

V(∞,∞)

3

(c1, c2, s) ∶= 1 − E

(Λ,W,G)∼Q∞

  • Λ−1 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

  • ,

29 / 35

slide-51
SLIDE 51

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

UNIFORM DEVIATION ON FIXED POINT EQUATIONS V(∞,∞)

1

(c1, c2, s) ∶= c1 + E

(Λ,W,G)∼Q∞

⎛ ⎜ ⎝ Λ−1/2W ⋅ proxs (Λ1/2ΠW⊥ (G) + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠ V(∞,∞)

2

(c1, c2, s) ∶= c2

1 + c2 2 −

E

(Λ,W,G)∼Q∞

⎛ ⎜ ⎝ Λ−1/2 proxs (Λ1/2ΠW⊥ (G) + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

⎞ ⎟ ⎠

2

V(∞,∞)

3

(c1, c2, s) ∶= 1 − E

(Λ,W,G)∼Q∞

  • Λ−1 proxs (Λ1/2G + ψ−1/2[∂1Fκ(c1, c2) − c1c−1

2 ∂2Fκ(c1, c2)]Λ1/2W)

ψ−1/2c−1

2 ∂2Fκ(c1, c2)

  • ,

if uniform convergence result holds, in the region c1 ∈ [0, M], c2 > 0, s > 0 lim

n→∞,p(n)/n=ψ

sup

c1∈[0,M],c2>0,s>0

(c2 ∨ 1)−1∣V(n,p)

1

(c1, c2, s) − V(∞,∞)

1

(c1, c2, s)∣ = 0 lim

n→∞,p(n)/n=ψ

sup

c1∈[0,M],c2>0,s>0

(c2 ∨ 1)−2∣V(n,p)

2

(c1, c2, s) − V(∞,∞)

2

(c1, c2, s)∣ = 0 lim

n→∞,p(n)/n=ψ

sup

c1∈[0,M],c2>0,s>0

(c2 ∨ 1)−1∣V(n,p)

3

(c1, c2, s) − V(∞,∞)

3

(c1, c2, s)∣ = 0

uniform convergence + uniqueness ⇒ “the a.s. limit”

29 / 35

slide-52
SLIDE 52

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

KEY: NEW UNIFORM DEVIATION

We derive uniform deviation over unbounded domain for the fixed-point equations, using a key self-normalization property of ∂iFκ(c1, c2).

[L. & Sur ’20] For i = 1, 2, we have w.p. at least 1 − n−2, sup

∣c1∣≤M, c2 > 0

∣∂iˆ Fκ(c1, c2) − ∂iFκ(c1, c2)∣ ≤ C log n √n

30 / 35

slide-53
SLIDE 53

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

KEY: NEW UNIFORM DEVIATION

We derive uniform deviation over unbounded domain for the fixed-point equations, using a key self-normalization property of ∂iFκ(c1, c2).

[L. & Sur ’20] For i = 1, 2, we have w.p. at least 1 − n−2, sup

∣c1∣≤M, c2 > 0

∣∂iˆ Fκ(c1, c2) − ∂iFκ(c1, c2)∣ ≤ C log n √n ∂1̂ Fκ(c1, c2) = − ̂ En[YZ1σ(κ − c1YZ1 − c2Z2)] (̂ En[σ2(κ − c1YZ1 − c2Z2)])1/2 = − ̂ En[YZ1σ(κc−1

2

− c1c−1

2 YZ1 − Z2)]

(̂ En[σ2(κc−1

2

− c1c−1

2 YZ1 − Z2)])1/2

∂2̂ Fκ(c1, c2) = − ̂ En[Z2σ(κ − c1YZ1 − c2Z2)] (̂ En[σ2(κ − c1YZ1 − c2Z2)])1/2 = − ̂ En[Z2σ(κc−1

2

− c1c−1

2 YZ1 − Z2)]

(̂ En[σ2(κc−1

2

− c1c−1

2 YZ1 − Z2)])1/2

where σ(t) ∶= max(t, 0) satisfies the positive homogeneity σ(∣c∣t) = ∣c∣σ(t).

  • region (i) (c1, c2) ∈ [−M, M] × (0, M]
  • region (ii) (c1, c2) ∈ [−M, M] × (M, ∞) ⇒ (c−1

2 , c1c−1 2 ) ∈ [0, 1/M) × (−1, 1)

30 / 35

slide-54
SLIDE 54

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

Large n limit: ̂ En → E, key uniform deviation, self-normalization property. Large p limit: Qp → Q∞, 2-uniform integrability of Qp due to W2.

31 / 35

slide-55
SLIDE 55

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

SOME EXTENSIONS

Our theoretical analysis can be extended to:

  • 1. other geometry:

Max-Lq-margin, q ≥ 1, both the statistical theory and algorithmic analysis κℓq(X, y) ∶= max

∥θ∥q≤1 min 1≤i≤n yix⊺ i θ .

32 / 35

slide-56
SLIDE 56

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

SOME EXTENSIONS

Our theoretical analysis can be extended to:

  • 1. other geometry:

Max-Lq-margin, q ≥ 1, both the statistical theory and algorithmic analysis κℓq(X, y) ∶= max

∥θ∥q≤1 min 1≤i≤n yix⊺ i θ .

  • 2. other models:
  • Model misspecification: let ˜

xi = (xi, zi), P(yi = +1∣˜ xi) = 1 − P(yi = −1∣˜ xi) = f(˜ x⊺

i θ⋆), only (xi, yi) is observed

  • Gaussian mixture models: P(yi = +1) = 1 − P(yi = −1) = υ ∈ (0, 1),

xi∣yi ∼ N (yi ⋅ θ⋆, Λ)

  • Models with planted structure in x

32 / 35

slide-57
SLIDE 57

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

FUTURE WORK

  • 1. quality of interpolated solution induced by different geometry
  • 2. beyond Gaussian
  • 3. nonlinear random feature models

33 / 35

slide-58
SLIDE 58

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

SUMMARY

Research agenda: statistical and computational theory for min-norm interpolants (naive usage of Rademacher complexity, or VC-dim struggles to explain)

34 / 35

slide-59
SLIDE 59

Intro. Min-Norm Interpolation Boosting and Margin Main Results: Precise Asymptotics Proof Sketch

SUMMARY

Research agenda: statistical and computational theory for min-norm interpolants (naive usage of Rademacher complexity, or VC-dim struggles to explain)

  • Regression: [L. & Rakhlin ’18, AOS], [L., Rakhlin & Zhai ’19, COLT]
  • Classification: [L. & Sur ’20]
  • Kernels vs. Neural Networks: [L. & Dou ’19, JASA], [L. & Tran-Bach ’20]

34 / 35

slide-60
SLIDE 60

References

Thank you!

  • Liang, T. & Sur, P. (2020). — A Precise High-Dimensional Asymptotic Theory for Boosting and Min-L1-Norm

Interpolated Classifiers. https://tyliang.github.io/Tengyuan.Liang/pdf/Liang-Sur-20.pdf

  • Liang, T., Tran-Bach, H. (2020). — Mehler’s Formula, Branching Process, and Compositional Kernels of Deep

Neural Networks.

  • Liang, T., Rakhlin, A. & Zhai, X. (2019). — On the Multiple Descent of Minimum-Norm Interpolants and

Restricted Lower Isometry of Kernels. Conference on Learning Theory (COLT)

  • Liang, T. & Rakhlin, A. (2018). — Just Interpolate: Kernel “Ridgeless” Regression Can Generalize.

The Annals of Statistics

  • Dou, X. & Liang, T. (2019). — Training Neural Networks as Learning Data-adaptive Kernels: Provable

Representation and Approximation Benefits. Journal of the American Statistical Association

Peter L Bartlett and Mikhail Traskin. Adaboost is consistent. Journal of Machine Learning Research, 8(Oct):2347–2368, 2007. Peter L. Bartlett, Peter J. Bickel, Peter B¨ uhlmann, Yoav Freund, Jerome Friedman, Trevor Hastie, Wenxin Jiang, Michael J. Jordan, Vladimir Koltchinskii, G´ abor Lugosi, Jon D. McAuliffe, Ya’acov Ritov, Saharan Rosset, Robert E. Schapire, Robert Tibshirani, Nicolas Vayatis, Bin Yu, Tong Zhang, and Ji Zhu. Discussions of boosting papers, and rejoinders. Annals of Statistics, 32(1):85–134, February 2004. ISSN 0090-5364, 2168-8966. doi: 10.1214/aos/1105988581. Peter L Bartlett, Philip M Long, G´ abor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. arXiv preprint arXiv:1906.11300, 2019. 35 / 35