Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS - PowerPoint PPT Presentation

Neural Architecture Optimization 神经网络结构优化赵鉴

CONTENTS 1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion

01 01 AutoML Auto Machine Learning

Typical Machine Learning • Fixed data order • Fixed model space • Fixed loss function 4

Auto Machine Learning • Auto data selection/process • Auto model selection and training • Auto hyper parameter tuning 5

02 02 NAS Neural Architecture Search

Architecture of a Neural Network is Crucial to its Performance ImageNet Winning Neural Architectures AlexNet 2012 Inception 2014 ResNet 2015 ZFNet 2013 7

NAS Neural Architecture Search Automatic Not many human Given Dataset efforts i.e., CIFAR-10, CIFAR-100 PTB, WikiText-2 Output … Network architecture that fits given dataset Target Task on the target task Goal i.e., image classification, well language modeling, Alleviate the pain of … human efforts 8

General Framework Generate Architectures Child Controller Network Train and Get Valid Performance 9

Typical Search Methods/Algorithms • Reinforcement Learning • Evolutionary Computing • Take each architecture choice • Changing the architecture as (i.e., sub-architecture) as mutation and selection action • Take the valid performance as • Take valid performance as fitness reward • Evolve the architectures • Use policy gradient to search the best action • AmoebaNet • NAS-RL (Google, 2017) • … • NASNet (Google, 2017) • ENAS (CMU & Google, 2018) • … 10

Results of Previous NAS Works • In terms of building • In terms of pushing SOTA results products with AutoML • On ImageNet • Microsoft, Google, … • Startups focus on AutoML 11

03 03 Neural Architecture Optimization Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu NIPS 2018

Are Previous NAS Works Perfect Enough? Why Search in Discrete How about Optimize in space? Continuous Space? • Exponentially large and thus • Compact and easy to optimize hard to search • Bring gradient (based optimization) back! 13

Basic Methods • Use a string to indicate the architectures • Search based on the data (𝑦, 𝑧) , where 𝑦 is arch string, 𝑧 is its valid performance “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” 14

Neural Architecture Optimization (NAO) Encoder - LSTM 01 • Encodes the discrete string tokens 𝒚 to an embedding vector 𝒇 𝒚 in continuous space Performance Predictor - FCN 02 • Maps 𝒇 𝒚 to its valid performance • Move towards the direction of gradients Decoder - LSTM 03 • Decoders the embedding vector 𝒇 𝒚′ back to the discrete tokens 𝒚′ 15

Gradient-Based Search in Continuous Space 16

Training & Inferencing • Train Encoder-Predictor-Decoder • Architecture pool of hundreds of (𝑦, 𝑧) pairs • Data augmentation: • symmetry architectures, swap two branches • i.e. “node1 conv 1x1 node2 conv 3x3” - > “node2 conv 3x3 node1 conv 1x1” • Encoder maps architecture 𝑦 into 𝒇 𝒚 • Performance-Predictor loss: squared error • 𝑴 𝒒𝒒 = 𝒚∈𝒀 (𝒕 𝒚 − 𝒈(𝒇 𝒚 )) 𝟑 • Decoder loss: reconstruction loss, nll loss • 𝑴 𝒔𝒇𝒅 = 𝒚∈𝒀 (− 𝐦𝐩𝐡 𝒇 𝑸 𝑬 𝒚 𝒇 𝒚 ) • Jointly train three components together • 𝑴 = 𝝁𝑴 𝒒𝒒 + (𝟐 − 𝝁)𝑴 𝒔𝒇𝒅 • Generate new architectures: • Generate new architecture embedding with step size 𝜽: 𝒇 𝒚′ = 𝒇 𝒚 + 𝜽𝛂𝒇 𝒚 • Decoder maps 𝒇 𝒚′ back into 𝒚 ′ 17 • Iterate: Train and evaluate new generated architectures and iterate over above steps

h[i-1] h[i] Weight Share conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 3x3 conv 3x3 conv 3x3 conv 3x3 conv 3x3 conv 3x3 max pool pool max pool max pool max max pool max pool avg pool avg pool avg pool avg pool avg pool avg pool add add add concat Architecture 1: “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” Architecture 2: “node 1, conv 3x3, node 2, max pooling, node 2, conv 1x1, node 2, conv 1x1, node 1, conv 3x3, node 1, max poo lin g”

04 04 Experiments and Results

Task Language Modeling Image Classification Modeling the probability Classify the images distribution over sequences of words in natural language CIFAR-10 PTB 10 classes Penn Tree Bank 50000 images for training 10000 images for testing CIFAR-100 WT2 100 classes WikiText-2 50000 images for training 10000 images for testing 20

CIFAR-10 21

Transfer to CIFAR-100 22

PTB 23

Transfer to WikiText-2 24

05 05 Conclusion

Conclusion New automatic architecture design algorithm • Encodes discrete description into continuous embedding • Performs the optimization within continuous space • Uses gradient based method rather than search discrete decisions Project Link • Paper Link: https://arxiv.org/abs/1808.07233 • Code Link: https://github.com/renqianluo/NAO 26

Thanks . QA

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS - PowerPoint PPT Presentation

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion 01 01 AutoML Auto Machine Learning Typical Machine Learning Fixed data order Fixed model space Fixed

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Neural Architecture Search Yu Cao What is Neural Architecture Search (NAS) Selecting the optimal

A Neural Network Architecture for Detec2ng Gramma2cal Errors in SMT A Neural Network Architecture

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Space: Space: System Architecture System Architecture vs Optimization vs Optimization

Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2017 Story so far Neural

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Featured Article Identification in Wikipedia - Thesis Defense - Christian Fricke

Introduction to Wikipedia editing Mike Peel 13 September 2014 Questions Who has edited

Introduction to Wikipedia editing Mike Peel 12 November 2014 Questions Who has used

Can we rely on Wikitext to get the links on a Wikipedia page? '''Niue''' ({{lang-niu|Niue}}) is

Improving Neural Language Models with Weight Norm Initialization and Regularization Christian

Scalaris: Scalable Web Applications with a Transactional Key-Value Store Nico Kruber Michael

CICM 2016, OpenMath workshop Implicit Content Dictionaries in the NIST Digital Repository of

Clean Code 1 / 24 Clean Code What is clean code? Elegant and efficient. Bjarne