Small ReLU networks are powerful memorizers: a tight analysis of - - PowerPoint PPT Presentation

small relu networks are powerful memorizers a tight
SMART_READER_LITE
LIVE PREVIEW

Small ReLU networks are powerful memorizers: a tight analysis of - - PowerPoint PPT Presentation

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT Given a ReLU fully-connected network, how many hidden nodes


slide-1
SLIDE 1

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Laboratory for Information and Decision Systems, MIT

slide-2
SLIDE 2

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a ReLU fully-connected network,
 how many hidden nodes are required to memorize arbitrary data points?

N

slide-3
SLIDE 3

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a ReLU fully-connected network,
 how many hidden nodes are required to memorize arbitrary data points?

N

1-hidden-layer, scalar regression:

dx

N

slide-4
SLIDE 4

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

We prove that for 2-hidden-layer networks, neurons are sufficient. 
 If , neurons are also necessary.

Θ( Ndy) dy = 1 Θ( N)

slide-5
SLIDE 5

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

We prove that for 2-hidden-layer networks, neurons are sufficient. 
 If , neurons are also necessary.

Θ( Ndy) dy = 1 Θ( N)

dx

2 Ndy

dy

2 Ndy

slide-6
SLIDE 6

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

We prove that for 2-hidden-layer networks, neurons are sufficient. 
 If , neurons are also necessary.

Θ( Ndy) dy = 1 Θ( N)

dx

2 Ndy

dy

2 Ndy

Depth-width trade-off

slide-7
SLIDE 7

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

Regression:

slide-8
SLIDE 8

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

Regression: Classification:

dx

2 N

dy

2 N

4dy

slide-9
SLIDE 9

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

Regression: Classification:

dx

2 N

dy

2 N

4dy

ImageNet ( 1M, 1k) memorized with 2k-2k-4k

N = dy =

slide-10
SLIDE 10

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

Regression: Classification:

dx

2 N

dy

2 N

4dy

ImageNet ( 1M, 1k) memorized with 2k-2k-4k

N = dy =

Depth-width trade-off

slide-11
SLIDE 11

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

2 hidden layers:

slide-12
SLIDE 12

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

2 hidden layers: hidden layers:

L

dx

⋮ ≈ 8Ndy L ⋮

dy

… ⋮ ≈ 8Ndy L

slide-13
SLIDE 13

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

dx

2 Ndy

dy

2 Ndy

2 hidden layers: hidden layers:

L

A Network with params can memorize if

W W = Ω(N)

dx

⋮ ≈ 8Ndy L ⋮

dy

… ⋮ ≈ 8Ndy L

slide-14
SLIDE 14

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a network, we define memorization capacity as

C

C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}

slide-15
SLIDE 15

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

neurons necessary and sufficient for 2-hidden-layer

Θ( N) ⟹ C = Θ(W)

Given a network, we define memorization capacity as

C

C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}

slide-16
SLIDE 16

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

neurons necessary and sufficient for 2-hidden-layer

Θ( N) ⟹ C = Θ(W)

Given a network, we define memorization capacity as

C

C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}

T i g h t

slide-17
SLIDE 17

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

neurons necessary and sufficient for 2-hidden-layer

Θ( N) ⟹ C = Θ(W)

sufficient for -hidden-layer

W = Ω(N) L ⟹ C = Ω(W) C ≤ VCdim = O(WL log W)

Given a network, we define memorization capacity as

C

C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}

T i g h t

slide-18
SLIDE 18

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

neurons necessary and sufficient for 2-hidden-layer

Θ( N) ⟹ C = Θ(W)

sufficient for -hidden-layer

W = Ω(N) L ⟹ C = Ω(W) C ≤ VCdim = O(WL log W)

Given a network, we define memorization capacity as

C

C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}

T i g h t N e a r l y T i g h t

slide-19
SLIDE 19

Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Other results

  • Tighter sufficient condition for memorizing in residual network
  • SGD trajectory analysis near memorizing global minimum

Poster #233 Wed Dec 11th 5PM-7PM @ East Exhibition Hall B + C