Mean-field theory of two-layers neural networks: dimension-free - - PowerPoint PPT Presentation

mean field theory of two layers neural networks dimension
SMART_READER_LITE
LIVE PREVIEW

Mean-field theory of two-layers neural networks: dimension-free - - PowerPoint PPT Presentation

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei, Theodor Misiakiewicz, and Andrea Montanari Stanford University June 26, 2019 COLT 2019 Song Mei (Stanford University) Mean Field Dynamics for


slide-1
SLIDE 1

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

Song Mei, Theodor Misiakiewicz, and Andrea Montanari

Stanford University

June 26, 2019

COLT 2019

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 1 / 12

slide-2
SLIDE 2

Gradient dynamics of two-layers neural network

◮ Two layers neural network:

Θ ❂✭θ✶❀ ✿ ✿ ✿ ❀ θ◆✮❀ θ✐ ❂ ✭❛✐❀ w✐✮ ✷ R❉✿ ❫ ②✭x❀ Θ✮ ❂ ✶ ◆

✐❂✶

❛✐✛✭❤w✐❀ x✐✮✿

◮ Risk function:

❘◆✭Θ✮ ❂ Ex❀②

❤✏

② ✶ ◆

✐❂✶

❛✐✛✭❤w✐❀ x✐✮✮

✑✷✐

◮ SGD/gradient flow:

Θ❦✰✶ ❂Θ❦ ✑❦r❵◆✭Θ❦❀ x❦❀ ②❦✮❀ ❞ ❞tΘt ❂ r❘◆✭Θt✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

slide-3
SLIDE 3

Gradient dynamics of two-layers neural network

◮ Two layers neural network:

Θ ❂✭θ✶❀ ✿ ✿ ✿ ❀ θ◆✮❀ θ✐ ❂ ✭❛✐❀ w✐✮ ✷ R❉✿ ❫ ②✭x❀ Θ✮ ❂ ✶ ◆

✐❂✶

❛✐✛✭❤w✐❀ x✐✮✿

◮ Risk function:

❘◆✭Θ✮ ❂ Ex❀②

❤✏

② ✶ ◆

✐❂✶

❛✐✛✭❤w✐❀ x✐✮✮

✑✷✐

◮ SGD/gradient flow:

Θ❦✰✶ ❂Θ❦ ✑❦r❵◆✭Θ❦❀ x❦❀ ②❦✮❀ ❞ ❞tΘt ❂ r❘◆✭Θt✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

slide-4
SLIDE 4

Gradient dynamics of two-layers neural network

◮ Two layers neural network:

Θ ❂✭θ✶❀ ✿ ✿ ✿ ❀ θ◆✮❀ θ✐ ❂ ✭❛✐❀ w✐✮ ✷ R❉✿ ❫ ②✭x❀ Θ✮ ❂ ✶ ◆

✐❂✶

❛✐✛✭❤w✐❀ x✐✮✿

◮ Risk function:

❘◆✭Θ✮ ❂ Ex❀②

❤✏

② ✶ ◆

✐❂✶

❛✐✛✭❤w✐❀ x✐✮✮

✑✷✐

◮ SGD/gradient flow:

Θ❦✰✶ ❂Θ❦ ✑❦r❵◆✭Θ❦❀ x❦❀ ②❦✮❀ ❞ ❞tΘt ❂ r❘◆✭Θt✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 2 / 12

slide-5
SLIDE 5

Two-layers neural networks

Hidden layer Output layer Input layer w1 a1 a2 a3 a4 w2 w3 w4

Figure: Architecture for ◆ ❂ ✹. θ✐ ❂ ✭❛✐❀ w✐✮

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 3 / 12

slide-6
SLIDE 6

Related literatures

◮ Mean field distributional dynamics:

❅t✚t✭θ✮ ❂ r ✁ ✭r✠✭θ❀ ✚t✮✚t✮✿

◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

◮ Neural tangent kernel (NTK) dynamics:

❅t❦ut❦✷

✷ ❂ ❤ut❀ ❍ut✐✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

slide-7
SLIDE 7

Related literatures

◮ Mean field distributional dynamics:

❅t✚t✭θ✮ ❂ r ✁ ✭r✠✭θ❀ ✚t✮✚t✮✿

◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

◮ Neural tangent kernel (NTK) dynamics:

❅t❦ut❦✷

✷ ❂ ❤ut❀ ❍ut✐✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

slide-8
SLIDE 8

Related literatures

◮ Mean field distributional dynamics:

❅t✚t✭θ✮ ❂ r ✁ ✭r✠✭θ❀ ✚t✮✚t✮✿

◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

◮ Neural tangent kernel (NTK) dynamics:

❅t❦ut❦✷

✷ ❂ ❤ut❀ ❍ut✐✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

slide-9
SLIDE 9

Related literatures

◮ Mean field distributional dynamics:

❅t✚t✭θ✮ ❂ r ✁ ✭r✠✭θ❀ ✚t✮✚t✮✿

◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

◮ Neural tangent kernel (NTK) dynamics:

❅t❦ut❦✷

✷ ❂ ❤ut❀ ❍ut✐✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

slide-10
SLIDE 10

Related literatures

◮ Mean field distributional dynamics:

❅t✚t✭θ✮ ❂ r ✁ ✭r✠✭θ❀ ✚t✮✚t✮✿

◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

◮ Neural tangent kernel (NTK) dynamics:

❅t❦ut❦✷

✷ ❂ ❤ut❀ ❍ut✐✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

slide-11
SLIDE 11

Related literatures

◮ Mean field distributional dynamics:

❅t✚t✭θ✮ ❂ r ✁ ✭r✠✭θ❀ ✚t✮✚t✮✿

◮ Non-linear dynamics. Converges in some cases. ◮ [Mei, Montanari, Nguyen, 2018], [Rotskoff and Vanden-Eijnden, 2018],

[Chizat and Bach, 2018a], [Sirignano and Spiliopoulos, 2018].

◮ Neural tangent kernel (NTK) dynamics:

❅t❦ut❦✷

✷ ❂ ❤ut❀ ❍ut✐✿ ◮ Linear dynamics. Always converges to ✵ empirical risk. ◮ [Jacot, Gabriel, and Clement, 2018], [Li and Liang, 2018], [Du, Zhai,

Poczos, Singh, 2018].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 4 / 12

slide-12
SLIDE 12

This work

(a) Improved bound for SGD - PDE interpolation. (b) Relationship of the mean field limit and the kernel limit.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 5 / 12

slide-13
SLIDE 13

SGD and distributional dynamics (DD)

◮ SGD for Θ❦, with ✭x❦❀ ②❦✮ ✘ Px❀②, ✐ ✷ ❬◆❪,

θ❦✰✶

❂ θ❦

✐ ✷s❦◆rθ✐❵◆✭Θ❦❀ x❦❀ ②❦✮✿

(SGD)

◮ [MMN18]: s❦ ❂ ✧✘✭❦✧✮, ❦ ❂ t❂✧, ◆ ✦ ✶, ✧ ✦ ✵:

❫ ✚✭◆✮

✑ ✶ ◆

✐❂✶

✍θ❦

✐ ✮ ✚t ✷ P✭R❉✮ ✂ ❬✵❀ ✶✮✿

◮ Distributional dynamics (DD) for ✚t,

❅t✚t✭θ✮ ❂✷✘✭t✮rθ ✁ ✭✚t✭θ✮rθ✠✭θ❀ ✚t✮✮❀ (DD) where ✠✭θ❀ ✚✮ ❂ ✍❘✭✚✮ ✍✚✭θ✮ ❂ ❱ ✭θ✮ ✰

❯✭θ❀ θ✵✮✚✭❞θ✵✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

slide-14
SLIDE 14

SGD and distributional dynamics (DD)

◮ SGD for Θ❦, with ✭x❦❀ ②❦✮ ✘ Px❀②, ✐ ✷ ❬◆❪,

θ❦✰✶

❂ θ❦

✐ ✷s❦◆rθ✐❵◆✭Θ❦❀ x❦❀ ②❦✮✿

(SGD)

◮ [MMN18]: s❦ ❂ ✧✘✭❦✧✮, ❦ ❂ t❂✧, ◆ ✦ ✶, ✧ ✦ ✵:

❫ ✚✭◆✮

✑ ✶ ◆

✐❂✶

✍θ❦

✐ ✮ ✚t ✷ P✭R❉✮ ✂ ❬✵❀ ✶✮✿

◮ Distributional dynamics (DD) for ✚t,

❅t✚t✭θ✮ ❂✷✘✭t✮rθ ✁ ✭✚t✭θ✮rθ✠✭θ❀ ✚t✮✮❀ (DD) where ✠✭θ❀ ✚✮ ❂ ✍❘✭✚✮ ✍✚✭θ✮ ❂ ❱ ✭θ✮ ✰

❯✭θ❀ θ✵✮✚✭❞θ✵✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

slide-15
SLIDE 15

SGD and distributional dynamics (DD)

◮ SGD for Θ❦, with ✭x❦❀ ②❦✮ ✘ Px❀②, ✐ ✷ ❬◆❪,

θ❦✰✶

❂ θ❦

✐ ✷s❦◆rθ✐❵◆✭Θ❦❀ x❦❀ ②❦✮✿

(SGD)

◮ [MMN18]: s❦ ❂ ✧✘✭❦✧✮, ❦ ❂ t❂✧, ◆ ✦ ✶, ✧ ✦ ✵:

❫ ✚✭◆✮

✑ ✶ ◆

✐❂✶

✍θ❦

✐ ✮ ✚t ✷ P✭R❉✮ ✂ ❬✵❀ ✶✮✿

◮ Distributional dynamics (DD) for ✚t,

❅t✚t✭θ✮ ❂✷✘✭t✮rθ ✁ ✭✚t✭θ✮rθ✠✭θ❀ ✚t✮✮❀ (DD) where ✠✭θ❀ ✚✮ ❂ ✍❘✭✚✮ ✍✚✭θ✮ ❂ ❱ ✭θ✮ ✰

❯✭θ❀ θ✵✮✚✭❞θ✵✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 6 / 12

slide-16
SLIDE 16

An improved bound

Assumption (i) ✛ bounded; (ii) rw✛✭❤x❀ w✐✮ sub-Gaussian; (iii) r✠ bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let ✭θ✵

✐ ✮✐✔◆ ✘✐✐❞ ✚✵. Then, ✽❢ bounded Lipschitz, w.h.p,

s✉♣

t✔❚

☞ ☞ ☞ ✶

✐❂✶

❢✭θ❜t❂✧❝

❢✭θ✮✚t✭θ✮

☞ ☞ ☞ ✔ ❋✉♥❝✭❚✮ ✁ s

✶ ◆ ❴ ❉✧✿ An example: learning a spherically symmetric Lipschitz function using ◆ ❂ ❖❞✭✶✮ neurons and ♥ ❂ ❖❞✭❞✮ samples. Caveat: this improved bound is not strong. In other cases the factor ❋✉♥❝✭❚✮ could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

slide-17
SLIDE 17

An improved bound

Assumption (i) ✛ bounded; (ii) rw✛✭❤x❀ w✐✮ sub-Gaussian; (iii) r✠ bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let ✭θ✵

✐ ✮✐✔◆ ✘✐✐❞ ✚✵. Then, ✽❢ bounded Lipschitz, w.h.p,

s✉♣

t✔❚

☞ ☞ ☞ ✶

✐❂✶

❢✭θ❜t❂✧❝

❢✭θ✮✚t✭θ✮

☞ ☞ ☞ ✔ ❋✉♥❝✭❚✮ ✁ s

✶ ◆ ❴ ❉✧✿ An example: learning a spherically symmetric Lipschitz function using ◆ ❂ ❖❞✭✶✮ neurons and ♥ ❂ ❖❞✭❞✮ samples. Caveat: this improved bound is not strong. In other cases the factor ❋✉♥❝✭❚✮ could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

slide-18
SLIDE 18

An improved bound

Assumption (i) ✛ bounded; (ii) rw✛✭❤x❀ w✐✮ sub-Gaussian; (iii) r✠ bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let ✭θ✵

✐ ✮✐✔◆ ✘✐✐❞ ✚✵. Then, ✽❢ bounded Lipschitz, w.h.p,

s✉♣

t✔❚

☞ ☞ ☞ ✶

✐❂✶

❢✭θ❜t❂✧❝

❢✭θ✮✚t✭θ✮

☞ ☞ ☞ ✔ ❋✉♥❝✭❚✮ ✁ s

✶ ◆ ❴ ❉✧✿ An example: learning a spherically symmetric Lipschitz function using ◆ ❂ ❖❞✭✶✮ neurons and ♥ ❂ ❖❞✭❞✮ samples. Caveat: this improved bound is not strong. In other cases the factor ❋✉♥❝✭❚✮ could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

slide-19
SLIDE 19

An improved bound

Assumption (i) ✛ bounded; (ii) rw✛✭❤x❀ w✐✮ sub-Gaussian; (iii) r✠ bdd. Lipschitz.

Theorem (M., Misiakiwicz, Montanari, 2019)

Let ✭θ✵

✐ ✮✐✔◆ ✘✐✐❞ ✚✵. Then, ✽❢ bounded Lipschitz, w.h.p,

s✉♣

t✔❚

☞ ☞ ☞ ✶

✐❂✶

❢✭θ❜t❂✧❝

❢✭θ✮✚t✭θ✮

☞ ☞ ☞ ✔ ❋✉♥❝✭❚✮ ✁ s

✶ ◆ ❴ ❉✧✿ An example: learning a spherically symmetric Lipschitz function using ◆ ❂ ❖❞✭✶✮ neurons and ♥ ❂ ❖❞✭❞✮ samples. Caveat: this improved bound is not strong. In other cases the factor ❋✉♥❝✭❚✮ could potentially be huge.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 7 / 12

slide-20
SLIDE 20

This work

(a) Improved bound for SGD - PDE interpolation. (b) Relationship of the mean field limit and the kernel limit.

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 8 / 12

slide-21
SLIDE 21

Recovering the kernel limit

Same idea appeared in [Chizat and Bach, 2018b], where the kernel limit was called “lazy training". Setup: Prediction function: ❫ ❢☛❀◆✭x❀ θ✮ ❂ ☛ ◆

❥❂✶

✛❄✭x❀ θ❥✮❀ Risk function: ❘☛❀◆✭θ✮ ❂Ex

❤✏

❢✭x✮ ❫ ❢☛❀◆✭x❀ θ✮

✑✷✐

❀ Gradient flow: ❞θt

❞t ❂ ◆ ✷☛✷ rθ❥❘☛❀◆✭θt✮✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 9 / 12

slide-22
SLIDE 22

The coupled dynamics

Denote ✚☛❀◆

t

❂ ✭✶❂◆✮ P◆

❥❂✶ ✍θt

❥. Distributional dynamics:

❅t✚☛❀◆

t

❂ ✭✶❂☛✮rθ ✁ ✭✚☛❀◆

t

rθ✠✭θ❀ ✚☛❀◆

t

✮✮✿ Denote ✉☛❀◆

t

✭z✮ ❂ ❢✭z✮ ❫ ❢☛❀◆✭z❀ θt✮. Residual dynamics: ❅t❦✉☛❀◆

t

❦✷

▲✷ ❂ ❤✉☛❀◆ t

❀ ❍✚☛❀◆

t

✉☛❀◆

t

✐✿ Here ❍✚✭x❀ z✮ ✑

❤rθ✛❄✭x❀ θ✮❀ rθ✛❄✭z❀ θ✮✐✚✭❞θ✮❀ ✠☛✭θ❀ ✚☛❀◆✮ ❂ Ex❬✉☛❀◆

t

✭x✮✛❄✭x❀ θ✮❪✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 10 / 12

slide-23
SLIDE 23

The coupled dynamics

Denote ✚☛❀◆

t

❂ ✭✶❂◆✮ P◆

❥❂✶ ✍θt

❥. Distributional dynamics:

❅t✚☛❀◆

t

❂ ✭✶❂☛✮rθ ✁ ✭✚☛❀◆

t

rθ✠✭θ❀ ✚☛❀◆

t

✮✮✿ Denote ✉☛❀◆

t

✭z✮ ❂ ❢✭z✮ ❫ ❢☛❀◆✭z❀ θt✮. Residual dynamics: ❅t❦✉☛❀◆

t

❦✷

▲✷ ❂ ❤✉☛❀◆ t

❀ ❍✚☛❀◆

t

✉☛❀◆

t

✐✿ Here ❍✚✭x❀ z✮ ✑

❤rθ✛❄✭x❀ θ✮❀ rθ✛❄✭z❀ θ✮✐✚✭❞θ✮❀ ✠☛✭θ❀ ✚☛❀◆✮ ❂ Ex❬✉☛❀◆

t

✭x✮✛❄✭x❀ θ✮❪✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 10 / 12

slide-24
SLIDE 24

The coupled dynamics

Denote ✚☛❀◆

t

❂ ✭✶❂◆✮ P◆

❥❂✶ ✍θt

❥. Distributional dynamics:

❅t✚☛❀◆

t

❂ ✭✶❂☛✮rθ ✁ ✭✚☛❀◆

t

rθ✠✭θ❀ ✚☛❀◆

t

✮✮✿ Denote ✉☛❀◆

t

✭z✮ ❂ ❢✭z✮ ❫ ❢☛❀◆✭z❀ θt✮. Residual dynamics: ❅t❦✉☛❀◆

t

❦✷

▲✷ ❂ ❤✉☛❀◆ t

❀ ❍✚☛❀◆

t

✉☛❀◆

t

✐✿ Here ❍✚✭x❀ z✮ ✑

❤rθ✛❄✭x❀ θ✮❀ rθ✛❄✭z❀ θ✮✐✚✭❞θ✮❀ ✠☛✭θ❀ ✚☛❀◆✮ ❂ Ex❬✉☛❀◆

t

✭x✮✛❄✭x❀ θ✮❪✿

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 10 / 12

slide-25
SLIDE 25

The mean field limit and kernel limit

❅t✚☛❀◆

t

❂✭✶❂☛✮rθ ✁ ✭✚☛❀◆

t

❬rθ✠✭θ❀ ✚☛❀◆

t

✮❪✮❀ ❅t❦✉☛❀◆

t

❦✷

▲✷ ❂ ❤✉☛❀◆ t

❀ ❍✚☛❀◆

t

✉☛❀◆

t

✐✿

◮ The mean field limit: fix ☛ ❂ ❖✭✶✮ and let ◆ ✦ ✶. ◮ The kernel limit: let ☛ ✦ ✶ after ◆ ✦ ✶. ◮ The benefit of kernel limit: the kernel will not change, and the

residual dynamics becomes self contained. The empirical risk will converge to ✵. Full derivation see appendix H of [Mei, Misiakiewics, Montanari, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 11 / 12

slide-26
SLIDE 26

The mean field limit and kernel limit

❅t✚☛❀◆

t

❂✭✶❂☛✮rθ ✁ ✭✚☛❀◆

t

❬rθ✠✭θ❀ ✚☛❀◆

t

✮❪✮❀ ❅t❦✉☛❀◆

t

❦✷

▲✷ ❂ ❤✉☛❀◆ t

❀ ❍✚☛❀◆

t

✉☛❀◆

t

✐✿

◮ The mean field limit: fix ☛ ❂ ❖✭✶✮ and let ◆ ✦ ✶. ◮ The kernel limit: let ☛ ✦ ✶ after ◆ ✦ ✶. ◮ The benefit of kernel limit: the kernel will not change, and the

residual dynamics becomes self contained. The empirical risk will converge to ✵. Full derivation see appendix H of [Mei, Misiakiewics, Montanari, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 11 / 12

slide-27
SLIDE 27

The mean field limit and kernel limit

❅t✚☛❀◆

t

❂✭✶❂☛✮rθ ✁ ✭✚☛❀◆

t

❬rθ✠✭θ❀ ✚☛❀◆

t

✮❪✮❀ ❅t❦✉☛❀◆

t

❦✷

▲✷ ❂ ❤✉☛❀◆ t

❀ ❍✚☛❀◆

t

✉☛❀◆

t

✐✿

◮ The mean field limit: fix ☛ ❂ ❖✭✶✮ and let ◆ ✦ ✶. ◮ The kernel limit: let ☛ ✦ ✶ after ◆ ✦ ✶. ◮ The benefit of kernel limit: the kernel will not change, and the

residual dynamics becomes self contained. The empirical risk will converge to ✵. Full derivation see appendix H of [Mei, Misiakiewics, Montanari, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 11 / 12

slide-28
SLIDE 28

Summary

◮ Gave an interpretation of neural tangent kernel in the mean field

point of view. (Also in [Chizat and Bach, 2018b])

◮ Whether NTK can explain the success of neural network is still an

  • pen problem. [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019],

[Ghorbani, Mei, Misiakiwics, and Montanari, 2019a, 2019b], [Allen-Zhu and Li, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 12 / 12

slide-29
SLIDE 29

Summary

◮ Gave an interpretation of neural tangent kernel in the mean field

point of view. (Also in [Chizat and Bach, 2018b])

◮ Whether NTK can explain the success of neural network is still an

  • pen problem. [Arora, Du, Hu, Li, Salakhutdinov, Wang, 2019],

[Ghorbani, Mei, Misiakiwics, and Montanari, 2019a, 2019b], [Allen-Zhu and Li, 2019].

Song Mei (Stanford University) Mean Field Dynamics for Neural Network June 26, 2019 12 / 12