Small ReLU networks are powerful memorizers: a tight analysis of - PowerPoint PPT Presentation

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT

Given a ReLU fully-connected network,   how many hidden nodes are required to memorize arbitrary data points? N Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a ReLU fully-connected network,   how many hidden nodes are required to memorize arbitrary data points? N 1-hidden-layer, scalar regression: N d x ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

We prove that for 2-hidden-layer networks, neurons are su ffi cient .   Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

We prove that for 2-hidden-layer networks, neurons are su ffi cient .   Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

We prove that for 2-hidden-layer networks, neurons are su ffi cient .   Θ ( Nd y ) neurons are also necessary . If , d y = 1 Θ ( N ) d y d x 2 Nd y 2 Nd y ⋮ Depth-width trade-o ff ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Regression: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Classification: d y d x 4 d y 2 2 N N Depth-width trade-o ff ⋮ ⋮ ⋮ ⋮ ⋮ ImageNet ( 1M, 1k) memorized with 2k-2k-4k N = d y = Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

2 hidden layers: d y d x 2 Nd y 2 Nd y ⋮ ⋮ ⋮ ⋮ hidden layers: L 8 Nd y 8 Nd y d y … d x ≈ ≈ L L ⋮ ⋮ ⋮ ⋮ A Network with params can memorize if W = Ω ( N ) W Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L ⟹ C = Ω ( W ) C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Given a network, we define memorization capacity as C C = max{ N ∣ the network can memorize arbitrary N data points with d y = 1} neurons necessary and su ffi cient for 2-hidden-layer Θ ( N ) t h g i T ⟹ C = Θ ( W ) su ffi cient for -hidden-layer W = Ω ( N ) L t h g i T y ⟹ C = Ω ( W ) l r a e N C ≤ VCdim = O ( WL log W ) Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Other results - Tighter su ffi cient condition for memorizing in residual network - SGD trajectory analysis near memorizing global minimum Poster #233 Wed Dec 11th 5PM-7PM @ East Exhibition Hall B + C Yun, Sra, Jadbabaie. NeurIPS 2019 A Tight Analysis of Memorization Capacity of ReLU Networks

Small ReLU networks are powerful memorizers: a tight analysis of - PowerPoint PPT Presentation

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT Given a ReLU fully-connected network, how many hidden nodes

On-line learning in neural networks with ReLU activation Michiel Straat September 19, 2018 1 /

Chapter 2 Tight-frames An Introduction 1 Outline 1. Tight-frame 1. Tight-frame 2. Matrix

ReLu and Maxout Networks and Their Possible Connections to Tropical Methods J org Zimmermann,

CUDA NEW FEATURES AND BEYOND Stephen Jones, GTC 2019 A QUICK LOOK BACK This Time Last Year...

All That Glisters Is Not Convnets: Hybrid Architectures For Faster, Better Solvers Prof Tom

TIME/ACCURACY TRADEOFFS FOR LEARNING A RELU WITH RESPECT TO GAUSSIAN MARGINALS Surbhi Goel

Collapse of Deep and Narrow ReLU Neural Nets Lu Lu , Yeonjong Shin, Yanhui Su, George Karniadakis

Present and Powerful Present and Powerful Psalm 46:1 God is our refuge and strength, an

Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey,

Building powerful brands Athens M konos Qatar D bai Building powerful brands Building powerful

Constructive universal high-dimensional distribution generation through deep ReLU networks Dmytro

Tight Gas in the Netherlands A Study Proposal EBN Exploration Day 23 May 2016 1 1 Why a

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

SVMpAUC-tight: A new algorithm for optimizing partial AUC based on a tight convex upper bound

On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens

Nonparametric regression using deep neural networks with ReLU activation function Johannes

Suggestions in British and American English: A corpus- linguistic study Ilka Flck

1 Plan for today Computer Graphics as Virtual Photography Small change in plans real

Computer Graphics (CS 543) Lecture 13c Ray Tracing Overview Prof Emmanuel Agu Computer Science

Home Network Performance Diagnosis Lucas Di Cioccio 1,2 , Renata Teixeira 2 , Catherine Rosenberg

BURNABY BOARD OF TRADE WARMING CENTRES YOUR VOICE. YOUR HOME. HOUSING & WORK FORCE SUPPLY

IA-32 Architecture CS 4440/7440 Malware Analysis and Defense Intel x86 Architecture } Security

IAB Report IETF 83 Paris, France March 26, 2012 Where is the IAB? Source: Jari Arkko Slide 2

AI Planner Applications Practical Applications of AI Planners Overview Deep Space 1

Small ReLU networks are powerful memorizers: a tight analysis of - PowerPoint PPT Presentation

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity Chulhee Yun, Suvrit Sra, Ali Jadbabaie Laboratory for Information and Decision Systems, MIT Given a ReLU fully-connected network, how many hidden nodes

On-line learning in neural networks with ReLU activation Michiel Straat September 19, 2018 1 /

Chapter 2 Tight-frames An Introduction 1 Outline 1. Tight-frame 1. Tight-frame 2. Matrix

ReLu and Maxout Networks and Their Possible Connections to Tropical Methods J org Zimmermann,

CUDA NEW FEATURES AND BEYOND Stephen Jones, GTC 2019 A QUICK LOOK BACK This Time Last Year...

All That Glisters Is Not Convnets: Hybrid Architectures For Faster, Better Solvers Prof Tom

TIME/ACCURACY TRADEOFFS FOR LEARNING A RELU WITH RESPECT TO GAUSSIAN MARGINALS Surbhi Goel

Collapse of Deep and Narrow ReLU Neural Nets Lu Lu , Yeonjong Shin, Yanhui Su, George Karniadakis

Present and Powerful Present and Powerful Psalm 46:1 God is our refuge and strength, an

Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey,

Building powerful brands Athens M konos Qatar D bai Building powerful brands Building powerful

Constructive universal high-dimensional distribution generation through deep ReLU networks Dmytro

Tight Gas in the Netherlands A Study Proposal EBN Exploration Day 23 May 2016 1 1 Why a

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

SVMpAUC-tight: A new algorithm for optimizing partial AUC based on a tight convex upper bound

On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens

Nonparametric regression using deep neural networks with ReLU activation function Johannes

Suggestions in British and American English: A corpus- linguistic study Ilka Flck

1 Plan for today Computer Graphics as Virtual Photography Small change in plans real

Computer Graphics (CS 543) Lecture 13c Ray Tracing Overview Prof Emmanuel Agu Computer Science

Home Network Performance Diagnosis Lucas Di Cioccio 1,2 , Renata Teixeira 2 , Catherine Rosenberg

BURNABY BOARD OF TRADE WARMING CENTRES YOUR VOICE. YOUR HOME. HOUSING &amp; WORK FORCE SUPPLY

IA-32 Architecture CS 4440/7440 Malware Analysis and Defense Intel x86 Architecture } Security

IAB Report IETF 83 Paris, France March 26, 2012 Where is the IAB? Source: Jari Arkko Slide 2

AI Planner Applications Practical Applications of AI Planners Overview Deep Space 1

BURNABY BOARD OF TRADE WARMING CENTRES YOUR VOICE. YOUR HOME. HOUSING & WORK FORCE SUPPLY