A signal propagation perspective for pruning neural networks at - PowerPoint PPT Presentation

A signal propagation perspective for pruning neural networks at initialization Namhoon Lee 1 , Thalaiyasingam Ajanthan 2 , Stephen Gould 2 , Philip Torr 1 1 University of Oxford, 2 Australian National University ICLR 2020 Spotlight presentation

Motivation Han et al. 2015

Motivation A typical pruning approach requires training steps (Han et al . 2015, Liu et al . 2019) . Han et al. 2015

Motivation A typical pruning approach requires training steps (Han et al . 2015, Liu et al . 2019) . Pruning can be done efficiently at initialization prior to training based on connection sensitivity (Lee et al ., 2019) .

Motivation A typical pruning approach requires training steps (Han et al . 2015, Liu et al . 2019) . Pruning can be done efficiently at initialization prior to training based on connection sensitivity (Lee et al ., 2019) . The initial random weights are drawn from appropriately scaled Gaussians (Glorot & Bengio, 2010) .

Motivation A typical pruning approach requires training steps (Han et al . 2015, Liu et al . 2019) . Pruning can be done efficiently at initialization prior to training based on connection sensitivity (Lee et al ., 2019) . The initial random weights are drawn from appropriately scaled Gaussians (Glorot & Bengio, 2010) . It remains unclear exactly why pruning at initialization is effective.

Motivation A typical pruning approach requires training steps (Han et al . 2015, Liu et al . 2019) . Pruning can be done efficiently at initialization prior to training based on connection sensitivity (Lee et al ., 2019) . The initial random weights are drawn from appropriately scaled Gaussians (Glorot & Bengio, 2010) . It remains unclear exactly why pruning at initialization is effective. Our take ⇒ Signal Propagation Perspective .

Initialization & connection sensitivity Sparsity pattern Sensitivity scores

Initialization & connection sensitivity (Linear) uniformly pruned throughout the network. Sparsity pattern → learning capability secured . Sensitivity scores

Initialization & connection sensitivity (Linear) uniformly pruned throughout the network. Sparsity pattern → learning capability secured . (tanh) more parameters pruned in the later layers. → critical for high sparsity pruning . Sensitivity scores

Initialization & connection sensitivity (Linear) uniformly pruned throughout the network. Sparsity pattern → learning capability secured . (tanh) more parameters pruned in the later layers. → critical for high sparsity pruning . CS scores decrease towards the later layers. Sensitivity → Choosing top salient parameters globally results in scores a network, in which parameters are distributed non-uniformly and sparsely towards the end .

Initialization & connection sensitivity (Linear) uniformly pruned throughout the network. Sparsity pattern → learning capability secured . (tanh) more parameters pruned in the later layers. → critical for high sparsity pruning . CS scores decrease towards the later layers. Sensitivity → Choosing top salient parameters globally results in scores a network, in which parameters are distributed non-uniformly and sparsely towards the end . CS metric can be decomposed as . → necessary to ensure reliable gradient!

Layerwise dynamical isometry for faithful gradients Proposition 1 ( Gradients in terms of Jacobians ). For a feed-forward network, the gradients satisfy: , where denotes the error signal, is the Jacobian from layer to the output layer , and refers to the derivative of nonlinearity.

Layerwise dynamical isometry for faithful gradients Proposition 1 ( Gradients in terms of Jacobians ). For a feed-forward network, the gradients satisfy: , where denotes the error signal, is the Jacobian from layer to the output layer , and refers to the derivative of nonlinearity. Definition 1 ( Layerwise dynamical isometry ). Let be the Jacobian matrix of layer . The network is said to satisfy layerwise dynamical isometry if the singular values of are concentrated near 1 for all layers; i.e. , for a given , the singular value satisfies for all .

Signal propagation and trainability Signal propagation Trainability (sparsity: 90%)

Signal propagation and trainability Jacobian singular values (JSV) decrease as per Signal propagation increasing sparsity. → Pruning weakens signal propagation . JSV drop rapidly with random pruning, compared to connection sensitivity (CS) based pruning. → CS pruning preserves signal propagation better . Trainability (sparsity: 90%)

Signal propagation and trainability Jacobian singular values (JSV) decrease as per Signal propagation increasing sparsity. → Pruning weakens signal propagation . JSV drop rapidly with random pruning, compared to connection sensitivity (CS) based pruning. → CS pruning preserves signal propagation better . Correlation between signal propagation and trainability. Trainability → The better a network propagates signals, (sparsity: 90%) the faster it converges during training .

Signal propagation and trainability Jacobian singular values (JSV) decrease as per Signal propagation increasing sparsity. → Pruning weakens signal propagation . JSV drop rapidly with random pruning, compared to connection sensitivity (CS) based pruning. → CS pruning preserves signal propagation better . Correlation between signal propagation and trainability. Trainability → The better a network propagates signals, (sparsity: 90%) the faster it converges during training . Enforce Approximate Isometry : → Restore signal propagation and improve training!

Validations and extensions Modern networks Pruning without supervision Architecture sculpting Non-linearities Transfer of sparsity

Summary The initial random weights have critical impact on pruning. ● Layerwise dynamical isometry ensures faithful signal propagation. ● Pruning breaks dynamical isometry and degrades trainability of a neural network. ● Yet, enforcing approximate isometry can recover signal propagation and enhance trainability. A range of experiments verify the effectiveness of signal propagation perspective. ●

A signal propagation perspective for pruning neural networks at - PowerPoint PPT Presentation

A signal propagation perspective for pruning neural networks at initialization Namhoon Lee 1 , Thalaiyasingam Ajanthan 2 , Stephen Gould 2 , Philip Torr 1 1 University of Oxford, 2 Australian National University ICLR 2020 Spotlight presentation

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Algorithms in Nature Pruning in neural networks Neural network development 1. Efficient signal

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis Chaoqi Wang , Roger Grosse,

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Standardizing Evaluation of Neural Network Pruning Jose Javier Gonzalez Davis Blalock John V.

Pruning Neural Belief Propagation Decoders Andreas Buchberger, 1 Christian H ager, 1 Henry D.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

THE AMATEURS FRIEND OR Enemy A short course on Propagation Propagation What is it? What

1 How to deal with Radio Propagation How to deal with Radio Propagation Where are you from?

Physical of radio propagation Two types of propagation models

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria

When is a Table not a Table? Toward the Identification of References to Communicative Artifacts

Information Presentation in Spoken Dialogue Systems by Vera Demberg University of Stuttgart

Optimal Stopping in a Dynamic Salience Model Markus Dertwinkel-Kalt 1 , Jonas Frey 2 and Mats K

Nonparametric Evidence on the Effects of Financial Incentives on Retirement Decisions Day Manoli

Ellipsis Licensing in Sluicing: A QuD Account Matthew Barros and Hadas Kotek Yale University {

How We See Terrence Sejnowski HHMI Salk Institute University of California, San Diego Levels

Deep Learning on HPC: Performance Factors and Lessons Learned Weijia Xu Scalable Computational

Think Deep Learning: Overview Ju Sun Computer Science & Engineering University of Minnesota,

A signal propagation perspective for pruning neural networks at - PowerPoint PPT Presentation

A signal propagation perspective for pruning neural networks at initialization Namhoon Lee 1 , Thalaiyasingam Ajanthan 2 , Stephen Gould 2 , Philip Torr 1 1 University of Oxford, 2 Australian National University ICLR 2020 Spotlight presentation

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Algorithms in Nature Pruning in neural networks Neural network development 1. Efficient signal

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis Chaoqi Wang , Roger Grosse,

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Standardizing Evaluation of Neural Network Pruning Jose Javier Gonzalez Davis Blalock John V.

Pruning Neural Belief Propagation Decoders Andreas Buchberger, 1 Christian H ager, 1 Henry D.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

THE AMATEURS FRIEND OR Enemy A short course on Propagation Propagation What is it? What

1 How to deal with Radio Propagation How to deal with Radio Propagation Where are you from?

Physical of radio propagation Two types of propagation models

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees &amp; Pruning Requests Criteria

When is a Table not a Table? Toward the Identification of References to Communicative Artifacts

Information Presentation in Spoken Dialogue Systems by Vera Demberg University of Stuttgart

Optimal Stopping in a Dynamic Salience Model Markus Dertwinkel-Kalt 1 , Jonas Frey 2 and Mats K

Nonparametric Evidence on the Effects of Financial Incentives on Retirement Decisions Day Manoli

Ellipsis Licensing in Sluicing: A QuD Account Matthew Barros and Hadas Kotek Yale University {

How We See Terrence Sejnowski HHMI Salk Institute University of California, San Diego Levels

Deep Learning on HPC: Performance Factors and Lessons Learned Weijia Xu Scalable Computational

Think Deep Learning: Overview Ju Sun Computer Science &amp; Engineering University of Minnesota,

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria

Think Deep Learning: Overview Ju Sun Computer Science & Engineering University of Minnesota,