small relu networks are powerful memorizers a tight
play

Small ReLU networks are powerful memorizers: a tight analysis of - PowerPoint PPT Presentation

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT Given a ReLU fully-connected network, how many hidden nodes


  1. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT

  2. Given a ReLU fully-connected network, 
 how many hidden nodes are required to memorize arbitrary data points? N Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  3. Given a ReLU fully-connected network, 
 how many hidden nodes are required to memorize arbitrary data points? N 1-hidden-layer, scalar regression: N d x ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  4. We prove that for 2-hidden-layer networks, neurons are su ffi cient . 
 Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  5. We prove that for 2-hidden-layer networks, neurons are su ffi cient . 
 Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  6. We prove that for 2-hidden-layer networks, neurons are su ffi cient . 
 Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ Depth-width trade-o ff ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  7. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  8. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  9. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  10. Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N Depth-width trade-o ff ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  11. 2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  12. 2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  13. 2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ A Network with params can memorize if W = Ω ( N ) W Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  14. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  15. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  16. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  17. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L ⟹ C = Ω ( W ) C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  18. Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L t h g i T y ⟹ C = Ω ( W ) l r a e N C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

  19. Other results - Tighter su ffi cient condition for memorizing in residual network - SGD trajectory analysis near memorizing global minimum Poster #233 Wed Dec 11th 5PM-7PM @ East Exhibition Hall B + C Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend