NAS-Bench-1Shot1: Benchmarking and Dissecting One-Shot Neural - PowerPoint PPT Presentation

NAS-Bench-1Shot1: Benchmarking and Dissecting One-Shot Neural Architecture Search Albert-Ludwigs-Universität Freiburg DeToL 07.11.2019 Julien Siems, Arbër Zela and Frank Hutter Under review as a conference paper at ICLR 2020

Motivation Recent Neural Architecture Search (NAS) methods use a one-  shot model to perform the search. Figure adapted from: Dong, Xuanyi, and Yi Yang. "One-Shot Neural Architecture Search via Self-Evaluated Template Network." arXiv preprint arXiv:1910.05733 (2019). 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 2

Motivation Recent Neural Architecture Search (NAS) methods use a one-  shot model to perform the search. Figure adapted from: Dong, Xuanyi, and Yi Yang. "One-Shot Neural Architecture Search via Self-Evaluated Template Network." arXiv preprint arXiv:1910.05733 (2019). ● Reproducibility crisis ● Need proper benchmarks [Lindauer and Hutter 2019] ● NAS-Bench-101 [Ying et al. 2019] 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 3

Motivation Recent Neural Architecture Search (NAS) methods use a one-  shot model to perform the search. Optimize architecture w.r.t. the one-shot validation loss.  Goal: Find an architecture which performs well when trained on its own. - - Question : How correlated are the two objectives? Question : How sensitive are the search methods towards their  Figure adapted from: Dong, Xuanyi, and Yi Yang. "One-Shot Neural Architecture Search via hyperparameters? Self-Evaluated Template Network." arXiv preprint arXiv:1910.05733 (2019). Problem : Independent training of discrete architectures is very  expensive. How could we increase the evaluation speed? - 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 4

Outline  Idea  One-Shot NAS Optimizers  Results  Conclusion 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 5

Idea DARTS Search Phases Architecture Search Architecture Evaluation - Train discrete arch. from scratch - Higher fidelity model: - More channels - More cells - Different training hyperparameters Liu et al. 2018 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 6

Idea DARTS Search Phases Architecture Search Architecture Evaluation - Train discrete arch. from scratch - Higher fidelity model: - More channels - More cells - Different training hyperparameters Price to pay to check intermediate architectures 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 7

Idea NASBench-101 Architecture Evaluation - Exhaustively evaluated search - Train discrete arch. from scratch space CIFAR-10 [REF] - Higher fidelity model: - > 400k unique graphs - More channels - Evaluated on 4 different budgets - More cells How can we use - Evaluated 3 times - Different training hyperparameters NASBench for Architecture Evaluation? 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 8

Idea DARTS Search Space NASBench Search Space Representation : edges are ops, nodes are - - Representation : edges depict tensor flow, nodes are combinations of tensors operations - Input of each cell are the 2 previous cells. Limited number of architectures by restricting each - - Intermediate node have 2 incoming edges cell: - Output of cell is concatenation of all intermediate <= 9 edges - node outputs - <= 5 intermediate nodes - Max-Pool, Conv-1x1, Conv-3x3 - Input of each cell is only previous cell . Architectures in the DARTS Search Space are usually not part of the NASBench Search Space. 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 9

Idea Modified search space by Bender et al. 2018 - Architectural weights: - On edges to output - On input edges to choice block - On the ‘mixed-op’ for each operation - 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 10

Idea Define search spaces by number - of parents of each node: 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 13

Idea This allowed the following analysis : Follow architecture trajectory of One-Shot NAS - Comparison of 4 One-shot NAS optimizers - - Correlation between One-shot validation error and NASBench validation error Hyperparameter Optimization of search methods. - 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 14

Outline  Idea  One-Shot NAS Optimizers  Results  Conclusion 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 15

One-Shot NAS Optimizers DARTS [Liu et al. 18] PC- DARTS [Xu et al. 19] Discrete optimizers: BOHB - - Hyperband Random Search - - Regularized Evolution SMAC - TPE - Figure from Xu, Yuhui, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. "PC-DARTS: Partial Channel Connections for Memory-Efficient - Reinforce Differentiable Architecture Search." (2019). More optimizers to be done … GDAS [Dong et al. 19] Random Search with Weight Sharing [Li et al. 19] Differentiably sample paths - Training: - through each cell. - Sample architecture from search space for - Only operations on path each batch and train one-shot model need to be evaluated weights. - Very fast search Evaluation: - Avoids co-adaption - Sample many archs., rank according to - one-shot validation error of 10 batches Fully evaluate top-10 archs. - 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 16

Outline  Idea  One-Shot NAS Optimizers  Results - NASBench 1-Shot-1 Analysis - NASBench 1-Shot-1 HPO  Conclusion 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 17

NAS-Bench-1Shot1 as Analysis Framework Optimizer Comparison Search Space 1 Search Space 3 - DARTS and GDAS: - stuck in local optimum - PC-DARTS: - stable search and relatively good performance for the given number of epochs - Random Search with WS: - explores mainly poor architectures 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 18

NAS-Bench-1Shot1 as Analysis Framework Regularized Search (Cutout) – Search Space 3 PC-DARTS DARTS GDAS - Longer search -> architectural overfitting - Little impact of cutout on found - Additional regularization has no - Cutout largely stabilized the search architectures. positive impact 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 19

NAS-Bench-1Shot1 as Analysis Framework Regularized Search (Weight Decay) – Search Space 3 PC-DARTS DARTS GDAS Higher regularization -> less stable search Higher regularization -> less stable search High regularization -> less stable search 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 20

NAS-Bench-1Shot1 as Analysis Framework Effect of one-shot learning rate – Search Space 3 PC-DARTS DARTS GDAS High learning-rate -> less stable search High learning-rate -> less stable search High learning-rate -> better search 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 21

NAS-Bench-1Shot1 as Analysis Framework Correlation 1 2 3 DARTS GDAS PC-DARTS Random-WS - No correlation between one-shot validation error and NASBench validation error: For all one-shot search methods - For all search spaces - Follows results by Sciuto et al. 19: They only estimated using 32 architectures - 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 22

Tunability of NAS optimizers Optimize the hyperparameters of one-shot NAS optimizers using BOHB [Falkner et al. 2018] - Outperform the default configuration by a factor of 7-10 - With the same number of function evaluations, they are able to outperform black-box NAS optimizers 11/07/2019 Benchmarking and Dissecting One-Shot Neural Architecture Search 23

Conclusion and Future Directions ● We presented NAS-Bench-1Shot1, a framework containing 3 benchmarks that enable to evaluate the anytime performance of one-shot NAS algorithms ● NAS-Bench-1Shot1 as analysis framework ● One-shot NAS optimizers can outperform black-box optimizers if tuned properly Future work: ● Add other methods such as ENAS [Pham et al. 2018] , ProxylessNAS [Cai et al. 2019] , etc. ● Automate the generation of plots, analysis results, or benchmark tables. ● Towards NAS-Bench-201 11/07/2019 Benchmarking and Disecting One-shot Neural Architecture Search 24

NAS-Bench-1Shot1: Benchmarking and Dissecting One-Shot Neural - PowerPoint PPT Presentation

NAS-Bench-1Shot1: Benchmarking and Dissecting One-Shot Neural Architecture Search Albert-Ludwigs-Universitt Freiburg DeToL 07.11.2019 Julien Siems, Arbr Zela and Frank Hutter Under review as a conference paper at ICLR 2020 Motivation

NAS Case Definition and Coding Jodi Jackson, MD KPQC Chairperson NAS Case Definition and Coding

NAS NAS

NAS FT Variants Performance Summary Best MFlop rates for all NAS FT Benchmark versions 1100 .5

BEDSIDE BENCH knowledge intervention COMMERCE BEDSIDE BENCH knowledge intervention

Bench Decorum Bench Decorum Definition Appropriate conduct and sportsmanlike behaviour on the

MCCB TESTING EQUIPMENTS LIST OF TEST EQUIPMENT MCCB THERMAL TRIP VERIFICATION TEST BENCH 1.

NAS Smackdown Presented by Kelly Leveille and Kevin McGregor May 13, 2008 What is NAS? A

NAS-Bench-101 : Towards Reproducible Neural Architecture Search Chris Ying 1 , Aaron Klein 2 ,

BK894 & BK895 500 kHz / 1 MHz Bench LCR Meters BK894 & BK895 500 kHz / 1 MHz Bench LCR

Bench Grinder BEST PRACTICES 2018 Uses of Bench Grinder Sharpening / Maintaining metal tools

CUSTOMS, EXCISE & SERVICE TAX APPELLATE TRIBUNAL SOUTH ZONAL BENCH BANGALORE REGIONAL BENCH

Benches of NCLT and NCLAT (2) State of Uttarakhand. 4. National Company Law Tribunal, Bengaluru

Passive Treatment of Mining Influenced Water: From Bench Scale to O & M From Bench Scale to O

Gaps between the bedside and the bench: Perspectives from the bench University of Oregon

+ ISDH Neonatal Abstinence Syndrome (NAS) Initiative 7 th Annual Prescription Drug Abuse and

NVR NAS Agenda Intro trodu ductio ction Background Challenges Solution Features

Uber Virginia Disruptive Technologies for Tomorrow April 2016 Push a Button, Get a Ride More

The Kell Calculus A Family of Higher-Order Distributed Process Calculi MYTHS/MIKADO/DART Meeting

Linked Data Mapper Mapper Linked Data A Browser rowser- -based Semantic Mapping

The Trouble with Types Martin Odersky EPFL History

SaiDeepTetali PatriceGodefroid,AdityaV.Nori,SriramK.Rajamani

Reconstructing Objects with Sparse Boundaries: Total Variation vs. Discrete Tomography Willem Jan

Aperiodic tilings (tutorial) Boris Solomyak U Washington and Bar-Ilan February 12, 2015, ICERM

Adaptive discontinuous Galerkin method for tsunami modeling and prediction on a global scale Jan

NAS-Bench-1Shot1: Benchmarking and Dissecting One-Shot Neural - PowerPoint PPT Presentation

NAS-Bench-1Shot1: Benchmarking and Dissecting One-Shot Neural Architecture Search Albert-Ludwigs-Universitt Freiburg DeToL 07.11.2019 Julien Siems, Arbr Zela and Frank Hutter Under review as a conference paper at ICLR 2020 Motivation

NAS Case Definition and Coding Jodi Jackson, MD KPQC Chairperson NAS Case Definition and Coding

NAS NAS

NAS FT Variants Performance Summary Best MFlop rates for all NAS FT Benchmark versions 1100 .5

BEDSIDE BENCH knowledge intervention COMMERCE BEDSIDE BENCH knowledge intervention

Bench Decorum Bench Decorum Definition Appropriate conduct and sportsmanlike behaviour on the

MCCB TESTING EQUIPMENTS LIST OF TEST EQUIPMENT MCCB THERMAL TRIP VERIFICATION TEST BENCH 1.

NAS Smackdown Presented by Kelly Leveille and Kevin McGregor May 13, 2008 What is NAS? A

NAS-Bench-101 : Towards Reproducible Neural Architecture Search Chris Ying *1 , Aaron Klein *2 ,

BK894 &amp; BK895 500 kHz / 1 MHz Bench LCR Meters BK894 &amp; BK895 500 kHz / 1 MHz Bench LCR

Bench Grinder BEST PRACTICES 2018 Uses of Bench Grinder Sharpening / Maintaining metal tools

CUSTOMS, EXCISE &amp; SERVICE TAX APPELLATE TRIBUNAL SOUTH ZONAL BENCH BANGALORE REGIONAL BENCH

Benches of NCLT and NCLAT (2) State of Uttarakhand. 4. National Company Law Tribunal, Bengaluru

Passive Treatment of Mining Influenced Water: From Bench Scale to O &amp; M From Bench Scale to O

Gaps between the bedside and the bench: Perspectives from the bench University of Oregon

+ ISDH Neonatal Abstinence Syndrome (NAS) Initiative 7 th Annual Prescription Drug Abuse and

NVR NAS Agenda Intro trodu ductio ction Background Challenges Solution Features

Uber Virginia Disruptive Technologies for Tomorrow April 2016 Push a Button, Get a Ride More

The Kell Calculus A Family of Higher-Order Distributed Process Calculi MYTHS/MIKADO/DART Meeting

Linked Data Mapper Mapper Linked Data A Browser rowser- -based Semantic Mapping

The Trouble with Types Martin Odersky EPFL History

SaiDeepTetali PatriceGodefroid,AdityaV.Nori,SriramK.Rajamani

Reconstructing Objects with Sparse Boundaries: Total Variation vs. Discrete Tomography Willem Jan

Aperiodic tilings (tutorial) Boris Solomyak U Washington and Bar-Ilan February 12, 2015, ICERM

Adaptive discontinuous Galerkin method for tsunami modeling and prediction on a global scale Jan

NAS-Bench-101 : Towards Reproducible Neural Architecture Search Chris Ying 1 , Aaron Klein 2 ,

BK894 & BK895 500 kHz / 1 MHz Bench LCR Meters BK894 & BK895 500 kHz / 1 MHz Bench LCR

CUSTOMS, EXCISE & SERVICE TAX APPELLATE TRIBUNAL SOUTH ZONAL BENCH BANGALORE REGIONAL BENCH

Passive Treatment of Mining Influenced Water: From Bench Scale to O & M From Bench Scale to O