Small ReLU networks are powerful memorizers: a tight analysis of - - PowerPoint PPT Presentation
Small ReLU networks are powerful memorizers: a tight analysis of - - PowerPoint PPT Presentation
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT Given a ReLU fully-connected network, how many hidden nodes
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a ReLU fully-connected network, how many hidden nodes are required to memorize arbitrary data points?
N
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a ReLU fully-connected network, how many hidden nodes are required to memorize arbitrary data points?
N
1-hidden-layer, scalar regression:
⋮
dx
⋮
N
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
We prove that for 2-hidden-layer networks, neurons are sufficient. If , neurons are also necessary.
Θ( Ndy) dy = 1 Θ( N)
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
We prove that for 2-hidden-layer networks, neurons are sufficient. If , neurons are also necessary.
Θ( Ndy) dy = 1 Θ( N)
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
We prove that for 2-hidden-layer networks, neurons are sufficient. If , neurons are also necessary.
Θ( Ndy) dy = 1 Θ( N)
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
Depth-width trade-off
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
Regression:
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
Regression: Classification:
⋮
dx
⋮
2 N
⋮
dy
⋮
2 N
⋮
4dy
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
Regression: Classification:
⋮
dx
⋮
2 N
⋮
dy
⋮
2 N
⋮
4dy
ImageNet ( 1M, 1k) memorized with 2k-2k-4k
N = dy =
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
Regression: Classification:
⋮
dx
⋮
2 N
⋮
dy
⋮
2 N
⋮
4dy
ImageNet ( 1M, 1k) memorized with 2k-2k-4k
N = dy =
Depth-width trade-off
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
2 hidden layers:
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
2 hidden layers: hidden layers:
L
⋮
dx
⋮ ≈ 8Ndy L ⋮
dy
… ⋮ ≈ 8Ndy L
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
⋮
dx
⋮
2 Ndy
⋮
dy
⋮
2 Ndy
2 hidden layers: hidden layers:
L
A Network with params can memorize if
W W = Ω(N)
⋮
dx
⋮ ≈ 8Ndy L ⋮
dy
… ⋮ ≈ 8Ndy L
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Given a network, we define memorization capacity as
C
C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
neurons necessary and sufficient for 2-hidden-layer
Θ( N) ⟹ C = Θ(W)
Given a network, we define memorization capacity as
C
C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
neurons necessary and sufficient for 2-hidden-layer
Θ( N) ⟹ C = Θ(W)
Given a network, we define memorization capacity as
C
C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}
T i g h t
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
neurons necessary and sufficient for 2-hidden-layer
Θ( N) ⟹ C = Θ(W)
sufficient for -hidden-layer
W = Ω(N) L ⟹ C = Ω(W) C ≤ VCdim = O(WL log W)
Given a network, we define memorization capacity as
C
C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}
T i g h t
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
neurons necessary and sufficient for 2-hidden-layer
Θ( N) ⟹ C = Θ(W)
sufficient for -hidden-layer
W = Ω(N) L ⟹ C = Ω(W) C ≤ VCdim = O(WL log W)
Given a network, we define memorization capacity as
C
C = max{N ∣ the network can memorize arbitrary N data points with dy = 1}
T i g h t N e a r l y T i g h t
Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks
Other results
- Tighter sufficient condition for memorizing in residual network
- SGD trajectory analysis near memorizing global minimum