what is the state of neural network pruning
play

What is the State of Neural Network Pruning? Davis Blalock* Jose - PowerPoint PPT Presentation

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan Frankle John V. Guttag *equal contribution Overview Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned


  1. What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan Frankle John V. Guttag *equal contribution

  2. Overview Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned hundreds of networks in controlled conditions • Some surprising findings… ShrinkBench Open source library to facilitate development and standardized evaluation of neural network pruning methods Blalock & Gonzalez 2

  3. Part 0: Background Blalock & Gonzalez 3

  4. Neural Network Pruning • Neural networks are often accurate but large • Pruning : Systematically removing parameters from a network Blalock & Gonzalez 4

  5. Typical Pruning Pipeline Data Pruning Finetuning Evaluation Algorithm Model Many design choices: • Scoring importance of parameters • Structure of induced sparsity • Schedule of pruning, training / • Finetuning details — optimizer, finetuning duration, hyperparameters Blalock & Gonzalez 5

  6. Evaluating Neural Network Pruning • Goal : Increase efficiency of Accuracy of Pruned Network network as much as possible with minimal drop in quality • Metrics • Quality = Accuracy • Efficiency = FLOPs, compression, latency… • Must use comparable tradeoffs 6 Blalock & Gonzalez 6

  7. Part 1: Meta-Analysis Blalock & Gonzalez 7

  8. Overview of Meta-Analysis Venue # of Papers • We aggregated results across arXiv only 22 81 pruning papers NeurIPS 16 ICLR 11 • Mostly published in top venues CVPR 9 ICML 4 • Corpus closed under ECCV 4 BMVC 3 experimental comparison IEEE Access 2 Other 10 Blalock & Gonzalez 8

  9. Robust Findings • Pruning works • Almost any heuristic improves efficiency with little performance drop • Many methods better than random pruning • Don’t prune all layers uniformly • Sparse models better for fixed # of parameters Blalock & Gonzalez 9

  10. Better Pruning vs Better Architecture Blalock & Gonzalez 10

  11. Ideal Results Over Time 2015 2016 2017 2018 2019 Compression Ratio (Dataset, Architecture, X metric, Y metric, Hyperparameters) → Curve Blalock & Gonzalez 11

  12. Ideal Results Over Time VGG-16 on ImageNet AlexNet on ImageNet ResNet-50 on ImageNet 2015 2016 Compression Ratio Compression Ratio Compression Ratio 2017 2018 2019 Theoretical Speedup Theoretical Speedup Theoretical Speedup Blalock & Gonzalez 12

  13. Actual Results Over Time VGG-16 on ImageNet AlexNet on ImageNet ResNet-50 on ImageNet 2015 2016 Compression Ratio Compression Ratio Compression Ratio 2017 2018 2019 Theoretical Speedup Theoretical Speedup Theoretical Speedup Blalock & Gonzalez 13

  14. Quantifying the Problem All (dataset, architecture) pairs used in at least 4 papers • Among 81 papers: Dataset Architecture # of Papers • 49 datasets Using Pair ImageNet VGG-16 22 • 132 architectures CIFAR-10 ResNet-56 14 ImageNet ResNet-50 14 • 195 (dataset, architecture) pairs ImageNet CaffeNet 11 ImageNet AlexNet 9 CIFAR-10 CIFAR-VGG 8 • Vicious cycle: extreme burden to ImageNet ResNet-34 6 ImageNet ResNet-18 6 compare to existing methods CIFAR-10 ResNet-110 5 CIFAR-10 PreResNet-164 4 CIFAR-10 ResNet-32 4 Blalock & Gonzalez 14

  15. Dearth of Reported Comparisons • Presence of comparisons: • Most papers compare to at most 1 other method • 40% papers have never been compared to • Pre-2010s methods almost completely ignored • Reinventing the wheel: • Magnitude-based pruning: Janowsky (1989) • Gradient times magnitude: Mozer & Smolensky (1989) • “Reviving” pruned weights: Tresp et al. (1997) Blalock & Gonzalez 15

  16. Pop quiz! • Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper? A. 80% B. 20% C. 5x D. No reported compression ratio Blalock & Gonzalez 16

  17. Pop quiz! • Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper? A. 80% B. 20% C. 5x D. No reported compression ratio Blalock & Gonzalez 17

  18. Pop quiz! • According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet? A. 371 million B. 500 million C. 724 million D. 1.5 billion Blalock & Gonzalez 18

  19. Pop quiz! • According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet? A. 371 million B. 500 million C. 724 million D. 1.5 billion Blalock & Gonzalez 19

  20. Part 2: ShrinkBench Blalock & Gonzalez 20

  21. Why ShrinkBench? • Want to hold everything but pruning algorithm constant • Improved rigor, development time Data Pruning Finetuning Evaluation Algorithm Model Potential confounding factors Blalock & Gonzalez 21

  22. Masking API • Lets algorithm return arbitrary masks for weight tensors • Standardizes all other aspects of training and evaluation Model (+ Data) Pruning Masks Accuracy Curve -2.1 4.6 0.8 -0.1 0 1 0 0 0 0 1 0 -2.1 4.6 0.8 -0.1 0 1 0 0 -2.1 4.6 0.8 -0.1 0.2 1.5 -4.9 2.3 0 0 1 0 0 0 1 1 0.2 1.5 -4.9 2.3 0 0 1 0 0.2 1.5 -4.9 2.3 -2.5 2.7 4.2 -1.1 1 1 1 0 1 1 1 1 -2.5 2.7 4.2 -1.1 1 1 1 0 -2.5 2.7 4.2 -1.1 -0.3 5.0 3.1 4.7 0 1 0 1 0 1 0 0 -0.3 5.0 3.1 4.7 0 1 0 1 -0.3 5.0 3.1 4.7 Blalock & Gonzalez 22

  23. Crucial to Vary Amount of Pruning & Architecture CIFAR-VGG ResNet-56 Blalock & Gonzalez 23

  24. Compression and Speedup are not Interchangeable ResNet-18 on ImageNet Blalock & Gonzalez 24

  25. Using Identical Initial Weights is essential ResNet-56 on CIFAR-10 Blalock & Gonzalez 25

  26. Conclusion • Pruning works • But not as well as improving architecture • But we have no idea what methods work the best • Field suffers from extreme fragmentation in experimental setups • We introduce a library/benchmark to address this • Faster progress in the future, interesting findings already https://github.com/jjgo/shrinkbench Blalock & Gonzalez 26

  27. Questions? Blalock & Gonzalez 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend