What is the State of Neural Network Pruning? Davis Blalock* Jose - PowerPoint PPT Presentation

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan Frankle John V. Guttag *equal contribution

Overview Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned hundreds of networks in controlled conditions • Some surprising findings… ShrinkBench Open source library to facilitate development and standardized evaluation of neural network pruning methods Blalock & Gonzalez 2

Part 0: Background Blalock & Gonzalez 3

Neural Network Pruning • Neural networks are often accurate but large • Pruning : Systematically removing parameters from a network Blalock & Gonzalez 4

Typical Pruning Pipeline Data Pruning Finetuning Evaluation Algorithm Model Many design choices: • Scoring importance of parameters • Structure of induced sparsity • Schedule of pruning, training / • Finetuning details — optimizer, finetuning duration, hyperparameters Blalock & Gonzalez 5

Evaluating Neural Network Pruning • Goal : Increase efficiency of Accuracy of Pruned Network network as much as possible with minimal drop in quality • Metrics • Quality = Accuracy • Efficiency = FLOPs, compression, latency… • Must use comparable tradeoffs 6 Blalock & Gonzalez 6

Part 1: Meta-Analysis Blalock & Gonzalez 7

Overview of Meta-Analysis Venue # of Papers • We aggregated results across arXiv only 22 81 pruning papers NeurIPS 16 ICLR 11 • Mostly published in top venues CVPR 9 ICML 4 • Corpus closed under ECCV 4 BMVC 3 experimental comparison IEEE Access 2 Other 10 Blalock & Gonzalez 8

Robust Findings • Pruning works • Almost any heuristic improves efficiency with little performance drop • Many methods better than random pruning • Don’t prune all layers uniformly • Sparse models better for fixed # of parameters Blalock & Gonzalez 9

Better Pruning vs Better Architecture Blalock & Gonzalez 10

Ideal Results Over Time 2015 2016 2017 2018 2019 Compression Ratio (Dataset, Architecture, X metric, Y metric, Hyperparameters) → Curve Blalock & Gonzalez 11

Ideal Results Over Time VGG-16 on ImageNet AlexNet on ImageNet ResNet-50 on ImageNet 2015 2016 Compression Ratio Compression Ratio Compression Ratio 2017 2018 2019 Theoretical Speedup Theoretical Speedup Theoretical Speedup Blalock & Gonzalez 12

Actual Results Over Time VGG-16 on ImageNet AlexNet on ImageNet ResNet-50 on ImageNet 2015 2016 Compression Ratio Compression Ratio Compression Ratio 2017 2018 2019 Theoretical Speedup Theoretical Speedup Theoretical Speedup Blalock & Gonzalez 13

Quantifying the Problem All (dataset, architecture) pairs used in at least 4 papers • Among 81 papers: Dataset Architecture # of Papers • 49 datasets Using Pair ImageNet VGG-16 22 • 132 architectures CIFAR-10 ResNet-56 14 ImageNet ResNet-50 14 • 195 (dataset, architecture) pairs ImageNet CaffeNet 11 ImageNet AlexNet 9 CIFAR-10 CIFAR-VGG 8 • Vicious cycle: extreme burden to ImageNet ResNet-34 6 ImageNet ResNet-18 6 compare to existing methods CIFAR-10 ResNet-110 5 CIFAR-10 PreResNet-164 4 CIFAR-10 ResNet-32 4 Blalock & Gonzalez 14

Dearth of Reported Comparisons • Presence of comparisons: • Most papers compare to at most 1 other method • 40% papers have never been compared to • Pre-2010s methods almost completely ignored • Reinventing the wheel: • Magnitude-based pruning: Janowsky (1989) • Gradient times magnitude: Mozer & Smolensky (1989) • “Reviving” pruned weights: Tresp et al. (1997) Blalock & Gonzalez 15

Pop quiz! • Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper? A. 80% B. 20% C. 5x D. No reported compression ratio Blalock & Gonzalez 16

Pop quiz! • Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper? A. 80% B. 20% C. 5x D. No reported compression ratio Blalock & Gonzalez 17

Pop quiz! • According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet? A. 371 million B. 500 million C. 724 million D. 1.5 billion Blalock & Gonzalez 18

Pop quiz! • According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet? A. 371 million B. 500 million C. 724 million D. 1.5 billion Blalock & Gonzalez 19

Part 2: ShrinkBench Blalock & Gonzalez 20

Why ShrinkBench? • Want to hold everything but pruning algorithm constant • Improved rigor, development time Data Pruning Finetuning Evaluation Algorithm Model Potential confounding factors Blalock & Gonzalez 21

Masking API • Lets algorithm return arbitrary masks for weight tensors • Standardizes all other aspects of training and evaluation Model (+ Data) Pruning Masks Accuracy Curve -2.1 4.6 0.8 -0.1 0 1 0 0 0 0 1 0 -2.1 4.6 0.8 -0.1 0 1 0 0 -2.1 4.6 0.8 -0.1 0.2 1.5 -4.9 2.3 0 0 1 0 0 0 1 1 0.2 1.5 -4.9 2.3 0 0 1 0 0.2 1.5 -4.9 2.3 -2.5 2.7 4.2 -1.1 1 1 1 0 1 1 1 1 -2.5 2.7 4.2 -1.1 1 1 1 0 -2.5 2.7 4.2 -1.1 -0.3 5.0 3.1 4.7 0 1 0 1 0 1 0 0 -0.3 5.0 3.1 4.7 0 1 0 1 -0.3 5.0 3.1 4.7 Blalock & Gonzalez 22

Crucial to Vary Amount of Pruning & Architecture CIFAR-VGG ResNet-56 Blalock & Gonzalez 23

Compression and Speedup are not Interchangeable ResNet-18 on ImageNet Blalock & Gonzalez 24

Using Identical Initial Weights is essential ResNet-56 on CIFAR-10 Blalock & Gonzalez 25

Conclusion • Pruning works • But not as well as improving architecture • But we have no idea what methods work the best • Field suffers from extreme fragmentation in experimental setups • We introduce a library/benchmark to address this • Faster progress in the future, interesting findings already https://github.com/jjgo/shrinkbench Blalock & Gonzalez 26

Questions? Blalock & Gonzalez 27

What is the State of Neural Network Pruning? Davis Blalock* Jose - PowerPoint PPT Presentation

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan Frankle John V. Guttag *equal contribution Overview Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis Chaoqi Wang , Roger Grosse,

Standardizing Evaluation of Neural Network Pruning Jose Javier Gonzalez Davis Blalock John V.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

Algorithms in Nature Pruning in neural networks Neural network development 1. Efficient signal

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

Welcome to the DCGO Presentation Basic Pruning Agenda Reasons for Pruning Tools

More on games (Ch. 5.4-5.6) Announcements Writing 2 posted Minimax Pruning in real life:

Random Sampling Revisited: Lattice Enumeration with Discrete Pruning Yoshinori Aono

Alpha- -beta pruning beta pruning Example Alpha Example reduce the branching factor of

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Rate Distortion for Model Compression: From Theory To Practice Weihao Gao , Yu-Han Liu ,

Delta-DNN : Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity The 49

What Youll Learn Today Review: how ASCII works and the great unfairness of bits What

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Data suppression and compression SW in DUNE detector simula6on

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1

What is the State of Neural Network Pruning? Davis Blalock* Jose - PowerPoint PPT Presentation

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan Frankle John V. Guttag *equal contribution Overview Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis Chaoqi Wang , Roger Grosse,

Standardizing Evaluation of Neural Network Pruning Jose Javier Gonzalez Davis Blalock John V.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

Algorithms in Nature Pruning in neural networks Neural network development 1. Efficient signal

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees &amp; Pruning Requests Criteria

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

Welcome to the DCGO Presentation Basic Pruning Agenda Reasons for Pruning Tools

More on games (Ch. 5.4-5.6) Announcements Writing 2 posted Minimax Pruning in real life:

Random Sampling Revisited: Lattice Enumeration with Discrete Pruning Yoshinori Aono

Alpha- -beta pruning beta pruning Example Alpha Example reduce the branching factor of

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Rate Distortion for Model Compression: From Theory To Practice Weihao Gao , Yu-Han Liu ,

Delta-DNN : Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity The 49

What Youll Learn Today Review: how ASCII works and the great unfairness of bits What

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Fast Software-managed Code Decompression Charles Lefurgy and Trevor Mudge Advanced Computer

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation

Data suppression and compression SW in DUNE detector simula6on

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria