Progressive Neural Architecture Search Chenxi Liu , Barret Zoph, - PowerPoint PPT Presentation

Progressive Neural Architecture Search Chenxi Liu , Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1

Outline Introduction and Architecture Progressive Experiments and Background Search Space Neural Results Architecture Search Algorithm 2

Introduction and Background 3

AutoML ● Hit Enter, sit back and relax, come back the next day for a high-quality machine learning solution ready to be delivered 4 Graph text Key Takeaway Category, 11pt Roboto Bold, Grey #5f6368 Key Takeaway 14pt Google Sans Bold Text, 11pt Roboto, Grey #5f6368 Blue #4285F4 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

What Is Preventing Us? Machine Learning solution Parameter Hyperparameter Neural Network 5 Graph text Key Takeaway Category, 11pt Roboto Bold, Grey #5f6368 Key Takeaway 14pt Google Sans Bold Text, 11pt Roboto, Grey #5f6368 Blue #4285F4 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

What Is Preventing Us? Machine Learning solution Parameter Hyperparameter Neural Network Automated :) 6 Graph text Key Takeaway Category, 11pt Roboto Bold, Grey #5f6368 Key Takeaway 14pt Google Sans Bold Text, 11pt Roboto, Grey #5f6368 Blue #4285F4 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

What Is Preventing Us? Key of AutoML Machine Learning solution Parameter Hyperparameter Neural Network Automated :) Not quite automated :( 7 Graph text Key Takeaway Category, 11pt Roboto Bold, Grey #5f6368 Key Takeaway 14pt Google Sans Bold Text, 11pt Roboto, Grey #5f6368 Blue #4285F4 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Where Are Hyperparameters? ● We usually think of those related to learning rate scheduling ● But for a neural network, many more lie in its architecture: Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going 8 deeper with convolutions." In CVPR. 2015.

Neural Architecture Search (NAS) ● Can we design network architectures automatically, instead of relying on expert experience and knowledge? Broadly, existing NAS literatures fall into two main categories: ● Evolutionary Algorithms (EA) ○ ○ Reinforcement Learning (RL) 9

Evolutionary Algorithms for NAS Best candidates (0, 1, 0, 1): 0.85 (2, 0, 3, 1): 0.84 (5, 1, 3, 3): 0.91 (0, 2, 0, 6): 0.92 … (0, 7, 3, 5): 0.82 String that defines Accuracy on network architecture validation set 10

Evolutionary Algorithms for NAS Best candidates New candidates (0, 1, 0, 1): 0.85 (0, 1, 0, 2): ???? (2, 0, 3, 1): 0.84 (2, 0, 4, 1): ???? (5, 1, 3, 3): 0.91 (5, 5, 3, 3): ???? mutate (0, 2, 0, 6): 0.92 (0, 2, 1, 6): ???? … … (0, 7, 3, 5): 0.82 (0, 6, 3, 5): ???? 11

Evolutionary Algorithms for NAS Best candidates New candidates (0, 1, 0, 1): 0.85 (0, 1, 0, 2): 0.86 (2, 0, 3, 1): 0.84 (2, 0, 4, 1): 0.83 (5, 1, 3, 3): 0.91 (5, 5, 3, 3): 0.90 (0, 2, 0, 6): 0.92 (0, 2, 1, 6): 0.91 … … (0, 7, 3, 5): 0.82 (0, 6, 3, 5): 0.80 12

Evolutionary Algorithms for NAS Best candidates New candidates (5, 5, 3, 3): 0.90 (0, 1, 0, 2): 0.86 (0, 2, 1, 6): 0.91 (2, 0, 4, 1): 0.83 (5, 1, 3, 3): 0.91 (5, 5, 3, 3): 0.90 merge (0, 2, 0, 6): 0.92 (0, 2, 1, 6): 0.91 … … (0, 1, 0, 2): 0.86 (0, 6, 3, 5): 0.80 13

Reinforcement Learning for NAS 0, 1, 0, 2! computing... LSTM Agent GPU/TPU 14

Reinforcement Learning for NAS updating... 0.86! LSTM Agent GPU/TPU 15

Reinforcement Learning for NAS 5, 5, 3, 3! computing... LSTM Agent GPU/TPU 16

Reinforcement Learning for NAS updating... 0.90! LSTM Agent GPU/TPU 17

Success and Limitation ● NASNet from Zoph et al. (2018) already surpassed human designs on ImageNet under the same # Mult-Add or # Params But very computationally intensive: ● Zoph & Le (2017): 800 K40 for 28 days ○ ○ Zoph et al. (2018): 500 P100 for 5 days Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." In ICLR. 2017. Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. "Learning transferable architectures for scalable image recognition." In CVPR. 2018. 18

Our Goal ● NASNet from Zoph et al. (2018) already surpassed human designs on ImageNet under the same # Mult-Add or # Params But very computationally intensive: ● Zoph & Le (2017): 800 K40 for 28 days ○ ○ Zoph et al. (2018): 500 P100 for 5 days ● Our goal: Speed up NAS by proposing an alternative algorithm Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." In ICLR. 2017. Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. "Learning transferable architectures for scalable image recognition." In CVPR. 2018. 19

Architecture Search Space 20

Taxonomy ● Similar to Zoph et al. (2018) construct construct Block Cell Network 21 Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. "Learning transferable architectures for scalable image recognition." In CVPR. 2018.

Cell -> Network ● Once we have a cell structure, we stack it up using a predefined pattern ● A network is fully specified with: ○ Cell structure ○ N (number of cell repetition) ○ F (number of filters in the first cell) ● N and F are selected by hand to control network complexity 22

Block -> Cell Cell H ● Each cell consists of B =5 blocks concat ● The cell’s output is the concatenation of the 5 blocks’ outputs H 1 H 2 H 3 H 4 H 5 B =5 blocks 23

Within a Block Block H b ● Input 1 is transformed by Operator 1 ● Input 2 is transformed by Operator 2 Combination ● Combine to give block’s output Operator 1 Operator 2 Input 1 Input 2 24

Within a Block Block H b ● Input 1 and Input 2 may select from: ○ Previous cell’s output ○ Previous-previous cell’s output Combination ○ Previous blocks’ output in current cell Operator 1 Operator 2 Input 1 Input 2 25

Within a Block Block H b ● Operator 1 and Operator 2 may select from: ○ 3x3 depth-separable convolution ○ 5x5 depth-separable convolution Combination ○ 7x7 depth-separable convolution ○ 1x7 followed by 7x1 convolution ○ Identity Operator 1 Operator 2 ○ 3x3 average pooling ○ 3x3 max pooling ○ 3x3 dilated convolution Input 1 Input 2 26

Within a Block Block H b ● Combination is element-wise addition Combination Operator 1 Operator 2 Input 1 Input 2 27

Architecture Search Space Summary H c concat ● One cell may look like: + 2 2 * 8 2 * 1 * ● sep max 3 2 * 8 2 * 1 * 3x3 3x3 4 2 * 8 2 * 1 * + + + + 5 2 * 8 2 * 1 * 6 2 * 8 2 * 1 = sep max sep sep iden sep sep max 10 14 possible 7x7 3x3 5x5 3x3 tity 3x3 5x5 3x3 combinations! H c-1 ... H c-2 28

Progressive Neural Architecture Search Algorithm 29

Main Idea: Simple-to-Complex Curriculum Previous approaches directly work with the 10 14 search space ● ● Instead, what if we progressively work our way in: ○ Begin by training all 1-block cells. There are only 256 of them! ○ Their scores are going to be low, because of they have fewer blocks... ○ But maybe their relative performances are enough to show which cells are promising and which are not. ○ Let the K most promising cells expand into 2-block cells, and iterate! 30

Progressive Neural Architecture Search: First Try ● Problem: for a reasonable K , too many 2-block candidates to train ○ It is “expensive” to obtain the performance of a cell/string Each one takes hours of training and evaluating ○ Maybe can afford 10 2 , but definitely cannot afford 10 5 ○ … … … train these 2-block cells K * B 2 (~10 5 ) expand promising 2-block cells … enumerate, train, select top K B 1 (256) 31

Performance Prediction with Surrogate Model ● Solution : train a “cheap” surrogate model that predicts the final performance simply by reading the string The data points collected in the “expensive” way are exactly ○ training data for this “cheap” surrogate model ● The two assessments are in fact used in an alternate fashion: Use “cheap” assessment when candidate pool is large (~10 5 ) ○ Use “expensive” assessment when it is small (~10 2 ) ○ (0, 2, 0, 6) 0.92 predictor 32

Performance Prediction with Surrogate Model ● Desired properties of this surrogate model/predictor: ○ Handle variable-size input strings Correlate with true performance ○ ○ Sample efficient ● We try both a MLP-ensemble and a RNN-ensemble as predictor ○ MLP-ensemble handles variable-size by mean pooling RNN-ensemble handles variable-size by unrolling a different ○ number of times 33

Progressive Neural Architecture Search predictor … enumerate and train all 1-block cells B 1 (256) 34

Progressive Neural Architecture Search predictor train predictor … enumerate and train all 1-block cells B 1 (256) 35

Progressive Neural Architecture Search Chenxi Liu , Barret Zoph, - PowerPoint PPT Presentation

Progressive Neural Architecture Search Chenxi Liu , Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction and Architecture Progressive

Neural Architecture Search Yu Cao What is Neural Architecture Search (NAS) Selecting the optimal

PROGRESSIVE DELIVERY PROGRESSIVE DELIVERY CONTINUOUS DELIVERY THE RIGHT WAY CONTINUOUS DELIVERY

Mobile Cross-Platform Development from a Progressive Perspective Web App Progressive Web App N

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Adaptive Progressive Web Apps PWA Progressive Web Apps are just great websites that can behave

Progressive Energy Governance Welcome Introduction to IGov Progressive Energy Governance

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Neural Architecture Search in a Proxy Validation Loss Landscape Yanxi Li 1 , Minjing Dong 1 ,

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

VUE.JS th e progressiv e javascrip t framewor k VUE.JS "Vue is a progressive framework for

Progressive Web Apps Mike Hartington, GDE & Developer Advocate at Ionic Roger Tipping, VP of

Progressive ExpectationMaximization for Hierarchical Volumetric Photon Mapping Wenzel Jakob 1,2

Progressive ExpectationMaximization for Hierarchical Volumetric Photon Mapping Wenzel Jakob 1,2

Progressive Embedding Hanxiao Shen, Zhongshi Jiang, Denis Zorin, Daniele Panozzo Geometric

CMU 15-896 Social networks 1: Coordination Games Teacher: Ariel Procaccia Background

Ranked Subsequence Matching in Time-Series Databases Wook-Shin Han (Kyungpook National

PROGRESSIVE SCREENING: LONG-TERM CONTRACTING WITH A PRIVATELY KNOWN STOCHASTIC PROCESS Maher

Progressive Neural Architecture Search Chenxi Liu , Barret Zoph, - PowerPoint PPT Presentation

Progressive Neural Architecture Search Chenxi Liu , Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction and Architecture Progressive

Neural Architecture Search Yu Cao What is Neural Architecture Search (NAS) Selecting the optimal

PROGRESSIVE DELIVERY PROGRESSIVE DELIVERY CONTINUOUS DELIVERY THE RIGHT WAY CONTINUOUS DELIVERY

Mobile Cross-Platform Development from a Progressive Perspective Web App Progressive Web App N

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Adaptive Progressive Web Apps PWA Progressive Web Apps are just great websites that can behave

Progressive Energy Governance Welcome Introduction to IGov Progressive Energy Governance

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Neural Architecture Search in a Proxy Validation Loss Landscape Yanxi Li 1 , Minjing Dong 1 ,

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

VUE.JS th e progressiv e javascrip t framewor k VUE.JS &quot;Vue is a progressive framework for

Progressive Web Apps Mike Hartington, GDE &amp; Developer Advocate at Ionic Roger Tipping, VP of

Progressive ExpectationMaximization for Hierarchical Volumetric Photon Mapping Wenzel Jakob 1,2

Progressive ExpectationMaximization for Hierarchical Volumetric Photon Mapping Wenzel Jakob 1,2

Progressive Embedding Hanxiao Shen, Zhongshi Jiang, Denis Zorin, Daniele Panozzo Geometric

CMU 15-896 Social networks 1: Coordination Games Teacher: Ariel Procaccia Background

Ranked Subsequence Matching in Time-Series Databases Wook-Shin Han (Kyungpook National

PROGRESSIVE SCREENING: LONG-TERM CONTRACTING WITH A PRIVATELY KNOWN STOCHASTIC PROCESS Maher

VUE.JS th e progressiv e javascrip t framewor k VUE.JS "Vue is a progressive framework for

Progressive Web Apps Mike Hartington, GDE & Developer Advocate at Ionic Roger Tipping, VP of