When do neural networks outperform kernel methods? Song Mei - - PowerPoint PPT Presentation

▶

when do neural networks outperform kernel methods

When do neural networks outperform kernel methods? Song Mei - - PowerPoint PPT Presentation

Jun 04, 2023 382 likes •646 views

When do neural networks outperform kernel methods? Song Mei Stanford University June 29, 2020 Joint work with Behrooz Ghorbani, Theodor Misiakiewicz, and Andrea Montanari Song Mei (Stanford University) Neural Networks and Kernel Methods June

slide-1

SLIDE 1

When do neural networks outperform kernel methods?

Song Mei

Stanford University

June 29, 2020

Joint work with Behrooz Ghorbani, Theodor Misiakiewicz, and Andrea Montanari

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 1 / 15

slide-2

SLIDE 2

Neural tangent model

◮ Multi-layers NN: ❢◆✭x❀ θ✮, x ✷ R❞, θ ✷ R◆ ◮ Expanding around θ✵: ❢◆✭x❀ θ✮ ❂ ❢◆✭x❀ θ✵✮ ✰ ❤θ θ✵❀ rθ❢◆✭x❀ θ✵✮✐ ✰ ♦✭❦θ θ✵❦✷✮✿ ◮ Neural tangent model: ❢NT❀◆✭x❀ β❀ θ✵✮ ❂ ❤β❀ rθ❢◆✭x❀ θ✵✮✐✿ ◮ Coupled gradient flow: ❞ ❞tθt ❂ rθ ❫ E❬✭② ❢◆✭x❀ θt✮✮✷❪❀ θ✵ ❂ θ✵❀ ❞ ❞tβt ❂ rβ ❫ E❬✭② ❢NT❀◆✭x❀ βt❀ θ✵✮✮✷❪❀ β✵ ❂ 0✿ ◮ Under proper initialization and over-parameterization: ❧✐♠

◆✦✶ ❥❢◆✭x❀ θt✮ ❢NT❀◆✭x❀ βt✮❥ ❂ ✵✿

[Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], ....

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15

slide-3

SLIDE 3

Neural tangent model

◮ Multi-layers NN: ❢◆✭x❀ θ✮, x ✷ R❞, θ ✷ R◆ ◮ Expanding around θ✵: ❢◆✭x❀ θ✮ ❂ ❢◆✭x❀ θ✵✮ ✰ ❤θ θ✵❀ rθ❢◆✭x❀ θ✵✮✐ ✰ ♦✭❦θ θ✵❦✷✮✿ ◮ Neural tangent model: ❢NT❀◆✭x❀ β❀ θ✵✮ ❂ ❤β❀ rθ❢◆✭x❀ θ✵✮✐✿ ◮ Coupled gradient flow: ❞ ❞tθt ❂ rθ ❫ E❬✭② ❢◆✭x❀ θt✮✮✷❪❀ θ✵ ❂ θ✵❀ ❞ ❞tβt ❂ rβ ❫ E❬✭② ❢NT❀◆✭x❀ βt❀ θ✵✮✮✷❪❀ β✵ ❂ 0✿ ◮ Under proper initialization and over-parameterization: ❧✐♠

◆✦✶ ❥❢◆✭x❀ θt✮ ❢NT❀◆✭x❀ βt✮❥ ❂ ✵✿

[Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], ....

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15

slide-4

SLIDE 4

Neural tangent model

◮ Multi-layers NN: ❢◆✭x❀ θ✮, x ✷ R❞, θ ✷ R◆ ◮ Expanding around θ✵: ❢◆✭x❀ θ✮ ❂ ❢◆✭x❀ θ✵✮ ✰ ❤θ θ✵❀ rθ❢◆✭x❀ θ✵✮✐ ✰ ♦✭❦θ θ✵❦✷✮✿ ◮ Neural tangent model: ❢NT❀◆✭x❀ β❀ θ✵✮ ❂ ❤β❀ rθ❢◆✭x❀ θ✵✮✐✿ ◮ Coupled gradient flow: ❞ ❞tθt ❂ rθ ❫ E❬✭② ❢◆✭x❀ θt✮✮✷❪❀ θ✵ ❂ θ✵❀ ❞ ❞tβt ❂ rβ ❫ E❬✭② ❢NT❀◆✭x❀ βt❀ θ✵✮✮✷❪❀ β✵ ❂ 0✿ ◮ Under proper initialization and over-parameterization: ❧✐♠

◆✦✶ ❥❢◆✭x❀ θt✮ ❢NT❀◆✭x❀ βt✮❥ ❂ ✵✿

[Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], ....

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15

slide-5

SLIDE 5

Neural tangent model

◮ Multi-layers NN: ❢◆✭x❀ θ✮, x ✷ R❞, θ ✷ R◆ ◮ Expanding around θ✵: ❢◆✭x❀ θ✮ ❂ ❢◆✭x❀ θ✵✮ ✰ ❤θ θ✵❀ rθ❢◆✭x❀ θ✵✮✐ ✰ ♦✭❦θ θ✵❦✷✮✿ ◮ Neural tangent model: ❢NT❀◆✭x❀ β❀ θ✵✮ ❂ ❤β❀ rθ❢◆✭x❀ θ✵✮✐✿ ◮ Coupled gradient flow: ❞ ❞tθt ❂ rθ ❫ E❬✭② ❢◆✭x❀ θt✮✮✷❪❀ θ✵ ❂ θ✵❀ ❞ ❞tβt ❂ rβ ❫ E❬✭② ❢NT❀◆✭x❀ βt❀ θ✵✮✮✷❪❀ β✵ ❂ 0✿ ◮ Under proper initialization and over-parameterization: ❧✐♠

◆✦✶ ❥❢◆✭x❀ θt✮ ❢NT❀◆✭x❀ βt✮❥ ❂ ✵✿

[Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], ....

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15

slide-6

SLIDE 6

Neural tangent model

◮ Multi-layers NN: ❢◆✭x❀ θ✮, x ✷ R❞, θ ✷ R◆ ◮ Expanding around θ✵: ❢◆✭x❀ θ✮ ❂ ❢◆✭x❀ θ✵✮ ✰ ❤θ θ✵❀ rθ❢◆✭x❀ θ✵✮✐ ✰ ♦✭❦θ θ✵❦✷✮✿ ◮ Neural tangent model: ❢NT❀◆✭x❀ β❀ θ✵✮ ❂ ❤β❀ rθ❢◆✭x❀ θ✵✮✐✿ ◮ Coupled gradient flow: ❞ ❞tθt ❂ rθ ❫ E❬✭② ❢◆✭x❀ θt✮✮✷❪❀ θ✵ ❂ θ✵❀ ❞ ❞tβt ❂ rβ ❫ E❬✭② ❢NT❀◆✭x❀ βt❀ θ✵✮✮✷❪❀ β✵ ❂ 0✿ ◮ Under proper initialization and over-parameterization: ❧✐♠

◆✦✶ ❥❢◆✭x❀ θt✮ ❢NT❀◆✭x❀ βt✮❥ ❂ ✵✿

[Jacot, Gabriel, Hongler, 2018], [Du, Zhai, Poczos, Singh, 2018], [Chizat, Bach, 2018b], ....

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 2 / 15

slide-7

SLIDE 7

How about generalization?

◮ [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019]: Cifar10 experiments. NT: ✷✸✪ test error. NN: less than ✺✪ test error. ◮ [Arora, Du, Li, Salakhutdinov, Wang, Yu, 2019]: Small dataset, NT sometimes generalize better than NN. ◮ [Shankar, Fang, Guo, Fridovich-Keil, Schmidt, Ragan-Kelley, Recht, 2020] [Li, Wang, Yu, Du, Hu, Salakhutdinov, Arora, 2019]: Smaller gap between NT and NN on Cifar10 (10✪ for NT). Sometimes there is a large gap, while sometimes the gap is small.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 3 / 15

slide-8

SLIDE 8

How about generalization?

◮ [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019]: Cifar10 experiments. NT: ✷✸✪ test error. NN: less than ✺✪ test error. ◮ [Arora, Du, Li, Salakhutdinov, Wang, Yu, 2019]: Small dataset, NT sometimes generalize better than NN. ◮ [Shankar, Fang, Guo, Fridovich-Keil, Schmidt, Ragan-Kelley, Recht, 2020] [Li, Wang, Yu, Du, Hu, Salakhutdinov, Arora, 2019]: Smaller gap between NT and NN on Cifar10 (10✪ for NT). Sometimes there is a large gap, while sometimes the gap is small.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 3 / 15

slide-9

SLIDE 9

Focus of this talk

When is there a large performance gap between NN and NT?

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 4 / 15

slide-10

SLIDE 10

Two-layers neural networks

Neural networks: ❋NN❀◆ ❂

♥

❢◆✭x❀ Θ✮ ❂

◆

❳

✐❂✶

❛✐✛✭❤w✐❀ x✐✮ ✿ ❛✐ ✷ R❀ w✐ ✷ R❞♦ ✿ Linearization: ❢◆✭x❀ Θ✮ ❂ ❢◆✭x❀ Θ✵✮ ✰

◆

❳

✐❂✶

✁❛✐✛✭❤w✵

✐ ❀ x✐✮

⑤ ④③ ⑥

Top layer linearization

✰

◆

❳

✐❂✶

❛✵

✐ ✛✵✭❤w✵ ✐ ❀ x✐✮❤✁w✐❀ x✐

⑤ ④③ ⑥

Bottom layer linearization

✰♦✭✁✮✿ Linearized neural networks (W ❂ ✭w✐✮✐✷❬◆❪ ✘✐✐❞ ❯♥✐❢✭S❞✶✮): ❋RF❀◆✭W ✮ ❂

♥

❢ ❂

◆

❳

✐❂✶

❛✐✛✭❤w✐❀ x✐✮ ✿ ❛✐ ✷ R❀ ✐ ✷ ❬◆❪

♦

❀ ❋NT❀◆✭W ✮ ❂

♥

❢ ❂

◆

❳

✐❂✶

✛✵✭❤w✐❀ x✐✮❤b✐❀ x✐ ✿ b✐ ✷ R❞❀ ✐ ✷ ❬◆❪

♦

✿

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 5 / 15

slide-11

SLIDE 11

Two-layers neural networks

Neural networks: ❋NN❀◆ ❂

♥

❢◆✭x❀ Θ✮ ❂

◆

❳

✐❂✶

❛✐✛✭❤w✐❀ x✐✮ ✿ ❛✐ ✷ R❀ w✐ ✷ R❞♦ ✿ Linearization: ❢◆✭x❀ Θ✮ ❂ ❢◆✭x❀ Θ✵✮ ✰

◆

❳

✐❂✶

✁❛✐✛✭❤w✵

✐ ❀ x✐✮

⑤ ④③ ⑥

Top layer linearization

✰

◆

❳

✐❂✶

❛✵

✐ ✛✵✭❤w✵ ✐ ❀ x✐✮❤✁w✐❀ x✐

⑤ ④③ ⑥

Bottom layer linearization

✰♦✭✁✮✿ Linearized neural networks (W ❂ ✭w✐✮✐✷❬◆❪ ✘✐✐❞ ❯♥✐❢✭S❞✶✮): ❋RF❀◆✭W ✮ ❂

♥

❢ ❂

◆

❳

✐❂✶

❛✐✛✭❤w✐❀ x✐✮ ✿ ❛✐ ✷ R❀ ✐ ✷ ❬◆❪

♦

❀ ❋NT❀◆✭W ✮ ❂

♥

❢ ❂

◆

❳

✐❂✶

✛✵✭❤w✐❀ x✐✮❤b✐❀ x✐ ✿ b✐ ✷ R❞❀ ✐ ✷ ❬◆❪

♦

✿

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 5 / 15

slide-12

SLIDE 12

Spiked features model

◮ Signal features and junk features x ❂ ✭x✶❀ x✷✮ ✷ R❞❀ x✶ ✷ R❞s❀ x✷ ✷ R❞❞s❀ ❞s ❂ ❞✑❀ ✵ ✔ ✑ ✔ ✶❀ Cov✭x✶✮ ❂ s♥r❢ ✁ I❞s❀ Cov✭x✷✮ ❂ I❞❞s❀ s♥r❢ ❂ ❞✔❀ ✵ ✔ ✔ ❁ ✶ (feature SNR)✿ ◮ Response depend on signal features ② ❂ ❢❄✭x✮ ✰ ✧❀ ❢❄✭x✮ ❂ ✬✭x✶✮✿

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Anisotropic features: ✔ ❃ ✵, s♥r❢ ❃ ✶

◮ Feature SNR: s♥r❢ ❂ ❞✔ ✕ ✶. ◮ Effective dimension: ❞❡☛ ❂ ❞s ❴ ✭❞❂s♥r❢✮. We have ❞s ✔ ❞❡☛ ✔ ❞. ◮ Larger s♥r❢ induces smaller ❞❡☛.

More precisely: x ✘ ❯♥✐❢✭S❞s✭r♣❞s✮✮ ✂ ❯♥✐❢✭S❞❞s✭ ♣ ❞✮✮. Generalizable to multi-spheres.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 6 / 15

slide-13

SLIDE 13

Spiked features model

◮ Signal features and junk features x ❂ ✭x✶❀ x✷✮ ✷ R❞❀ x✶ ✷ R❞s❀ x✷ ✷ R❞❞s❀ ❞s ❂ ❞✑❀ ✵ ✔ ✑ ✔ ✶❀ Cov✭x✶✮ ❂ s♥r❢ ✁ I❞s❀ Cov✭x✷✮ ❂ I❞❞s❀ s♥r❢ ❂ ❞✔❀ ✵ ✔ ✔ ❁ ✶ (feature SNR)✿ ◮ Response depend on signal features ② ❂ ❢❄✭x✮ ✰ ✧❀ ❢❄✭x✮ ❂ ✬✭x✶✮✿

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Isotropic features: ✔ ❂ ✵, s♥r❢ ❂ ✶

◮ Feature SNR: s♥r❢ ❂ ❞✔ ✕ ✶. ◮ Effective dimension: ❞❡☛ ❂ ❞s ❴ ✭❞❂s♥r❢✮. We have ❞s ✔ ❞❡☛ ✔ ❞. ◮ Larger s♥r❢ induces smaller ❞❡☛.

More precisely: x ✘ ❯♥✐❢✭S❞s✭r♣❞s✮✮ ✂ ❯♥✐❢✭S❞❞s✭ ♣ ❞✮✮. Generalizable to multi-spheres.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 6 / 15

slide-14

SLIDE 14

Spiked features model

◮ Signal features and junk features x ❂ ✭x✶❀ x✷✮ ✷ R❞❀ x✶ ✷ R❞s❀ x✷ ✷ R❞❞s❀ ❞s ❂ ❞✑❀ ✵ ✔ ✑ ✔ ✶❀ Cov✭x✶✮ ❂ s♥r❢ ✁ I❞s❀ Cov✭x✷✮ ❂ I❞❞s❀ s♥r❢ ❂ ❞✔❀ ✵ ✔ ✔ ❁ ✶ (feature SNR)✿ ◮ Response depend on signal features ② ❂ ❢❄✭x✮ ✰ ✧❀ ❢❄✭x✮ ❂ ✬✭x✶✮✿

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Isotropic features: ✔ ❂ ✵, s♥r❢ ❂ ✶

◮ Feature SNR: s♥r❢ ❂ ❞✔ ✕ ✶. ◮ Effective dimension: ❞❡☛ ❂ ❞s ❴ ✭❞❂s♥r❢✮. We have ❞s ✔ ❞❡☛ ✔ ❞. ◮ Larger s♥r❢ induces smaller ❞❡☛.

More precisely: x ✘ ❯♥✐❢✭S❞s✭r♣❞s✮✮ ✂ ❯♥✐❢✭S❞❞s✭ ♣ ❞✮✮. Generalizable to multi-spheres.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 6 / 15

slide-15

SLIDE 15

Approximation error with ◆ neurons

Approximation error: ❘✭❢❄❀ ❋✮ ❂ ✐♥❢❢✷❋ ❦❢❄ ❢❦✷

▲✷.

Theorem (Ghorbani, Mei, Misiakiewicz, Montanari, 2020)

Assume ❞❡☛❵✰✍ ✔ ◆ ✔ ❞❡☛❵✰✶✍ and “generic condition” on ✛, we have ❘✭❢❄❀ ❋RF❀◆✭W ✮✮ ❂ ❦P❃❵❢❄❦✷

▲✷ ✰ ♦❞❀P✭✁✮❀

❘✭❢❄❀ ❋NT❀◆✭W ✮✮ ❂ ❦P❃❵✰✶❢❄❦✷

▲✷ ✰ ♦❞❀P✭✁✮✿

On the contrary, assume ❞s❵✰✍ ✔ ◆ ✔ ❞s❵✰✶✍, we have ❘✭❢❄❀ ❋NN❀◆✮ ✔ ❦P❃❵✰✶❢❄❦✷

▲✷ ✰ ♦❞✭✁✮✿

Moreover, ❘✭❢❄❀ ❋NN❀◆✮ is independent of s♥r❢.

P❃❵: projection orthogonal to the space of degree-❵ polynomials.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 7 / 15

slide-16

SLIDE 16

Approximation error with ◆ neurons

Dim ❞❡☛ ✑ ❞s ❴ ✭❞❂s♥r❢✮ and ❞s ✔ ❞❡☛ ✔ ❞. To approx. a degree-❵ poly. in x✶: ◮ NN need at most ❞s❵ parameters*. ◮ RF need ❞❡☛❵ parameters. ◮ NT need ❞❡☛❵✶ ✁ ❞ parameters. Approximation power: NN ✕ RF ✕ NT.

* If we don’t count parameters with value 0.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 8 / 15

slide-17

SLIDE 17

Extreme case: low feature SNR

Fix ✵ ❁ ✑ ❁ ✶, low s♥r❢: ✔ ❂ ✵. To approx. a degree-❵ poly. in x✶: ◮ NN need at most ❞✑❵ parameters*. ◮ RF need ❞❵ parameters. ◮ NT need ❞❵ parameters. Approximation power: NN ❃ RF ❂ NT.

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Isotropic features: ✔ ❂ ✵, s♥r❢ ❂ ✶

* If we don’t count parameters with value 0.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 8 / 15

slide-18

SLIDE 18

Extreme case: high feature SNR

Fix ✵ ❁ ✑ ❁ ✶, high s♥r❢: ✔ ✢ ✶. To approx. a degree-❵ poly. in x✶: ◮ NN need at most ❞✑❵ parameters*. ◮ RF need ❞✑❵ parameters. ◮ NT need ❞✑✭❵✶✮✰✶ parameters. Approximation power: NN ✘ RF ❃ NT.

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Anisotropic features: ✔ ❃ ✵, s♥r❢ ❃ ✶

* If we don’t count parameters with value 0.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 8 / 15

slide-19

SLIDE 19

Numerical simulations

1.2 1.4 1.6 1.8 2.0 2.2

log(Params) log(d)

0.0 0.2 0.4 0.6 0.8 1.0 R/R0 Linear Quadratic Cubic Quartic

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Colorbar: ✔ ✷ ❬✵❀ ✶❪. Dot-dashed: NN. Dashed lines: RF; Continuous lines: NT; Dimension: ❞ ❂ ✶✵✷✹.

Eff. dim: ❞s ❂ ✶✻.

Conclusion

(a) Power: NN ✕ RF ✕ NT. (b) Risk of NN independent of s♥r❢. (c) Larger s♥r❢ induces larger power of ❢RF❀ NT❣.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 9 / 15

slide-20

SLIDE 20

Similar results for generalization error with finite samples ♥

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 10 / 15

slide-21

SLIDE 21

Extreme case: low feature SNR

Fix ✵ ❁ ✑ ❁ ✶, low s♥r❢: ✔ ❂ ✵. To fit a degree-❵ poly. in x✶: ◮ ✾✗, NN need at most ❞✑❵ samples. ◮ ❢RF❀ NT❣ kernel need ❞❵ samples. Potential generalization power: NN ❃ Kernel methods.

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Isotropic features: ✔ ❂ ✵, s♥r❢ ❂ ✶

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 11 / 15

slide-22

SLIDE 22

Extreme case: high feature SNR

Fix ✵ ❁ ✑ ❁ ✶, high s♥r❢: ✔ ✢ ✶. To fit a degree-❵ poly. in x✶: ◮ ✾✗, NN need at most ❞✑❵ samples. ◮ ❢RF❀ NT❣ kernel need ❞✑❵ samples. Potential generalization power: NN ✘ Kernel methods.

x1

f⋆(x1, x2) = φ(x1)

x2

Figure: Anisotropic features: ✔ ❃ ✵, s♥r❢ ❃ ✶

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 11 / 15

slide-23

SLIDE 23

Implications

Adding isotropic noise in features (i.e., decreasing s♥r❢), performance gap between NN and ❢RF❀ NT❣ becomes larger.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 12 / 15

slide-24

SLIDE 24

Numerical simulations

0.0 0.5 1.0 1.5 2.0 2.5 3.0 Noise Strength, τ 0.82 0.84 0.86 0.88 0.90 0.92 Classification Accuracy (Test Set)

Linear Quadratic NN RF KRR NT KRR RF NT τ = 0.0 τ = 1.0 τ = 2.0 τ = 3.0

Figure: Underlying assumption: labels depend on low frequency components of images.

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 13 / 15

slide-25

SLIDE 25

Message

In spiked features model, a controlling parameter of the performance gap between NN and ❢RF❀ NT❣ is s♥r❢ ❂ Feature SNR ❂ Signal features variance Junk features variance ✿ ◮ Small s♥r❢, there is a large separation. ◮ Large s♥r❢, ❢RF❀ NT❣ performs closer to NN. Somewhat implicitly, NN first finds the signal features (PCA), and then perform kernel methods on these features.

s♥r❢ ✻❂ SNR ❂ ❦❢❄❦✷

▲✷❂E❬✧✷❪

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 14 / 15

slide-26

SLIDE 26

Thank you!

Song Mei (Stanford University) Neural Networks and Kernel Methods June 29, 2020 15 / 15