Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS - - PowerPoint PPT Presentation

neural architecture optimization
SMART_READER_LITE
LIVE PREVIEW

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS - - PowerPoint PPT Presentation

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion 01 01 AutoML Auto Machine Learning Typical Machine Learning Fixed data order Fixed model space Fixed


slide-1
SLIDE 1

Neural Architecture Optimization 神经网络结构优化

赵鉴

slide-2
SLIDE 2

1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion

CONTENTS

slide-3
SLIDE 3

AutoML

Auto Machine Learning

01 01

slide-4
SLIDE 4

Typical Machine Learning

4

  • Fixed data order
  • Fixed model space
  • Fixed loss function
slide-5
SLIDE 5

Auto Machine Learning

5

  • Auto data selection/process
  • Auto model selection and training
  • Auto hyper parameter tuning
slide-6
SLIDE 6

NAS

Neural Architecture Search

02 02

slide-7
SLIDE 7

Architecture of a Neural Network is Crucial to its Performance

7

ImageNet Winning Neural Architectures AlexNet 2012 Inception 2014 ZFNet 2013 ResNet 2015

slide-8
SLIDE 8

NAS

8

i.e., image classification, language modeling, …

Target Task

i.e., CIFAR-10, CIFAR-100 PTB, WikiText-2 …

Given Dataset

Not many human efforts

Automatic

Network architecture that fits given dataset

  • n the target task

well

Output

Alleviate the pain of human efforts

Goal

Neural Architecture Search

slide-9
SLIDE 9

9

General Framework

Controller Child Network Generate Architectures Train and Get Valid Performance

slide-10
SLIDE 10

Typical Search Methods/Algorithms

10

  • Reinforcement Learning
  • Take each architecture choice

(i.e., sub-architecture) as action

  • Take valid performance as

reward

  • Use policy gradient to search

the best action

  • NAS-RL (Google, 2017)
  • NASNet (Google, 2017)
  • ENAS (CMU & Google, 2018)
  • Evolutionary Computing
  • Changing the architecture as

mutation and selection

  • Take the valid performance as

fitness

  • Evolve the architectures
  • AmoebaNet
slide-11
SLIDE 11

Results of Previous NAS Works

11

  • In terms of pushing SOTA

results

  • On ImageNet
  • In terms of building

products with AutoML

  • Microsoft, Google, …
  • Startups focus on AutoML
slide-12
SLIDE 12

Neural Architecture Optimization

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu NIPS 2018

03 03

slide-13
SLIDE 13

Are Previous NAS Works Perfect Enough?

13

Why Search in Discrete space?

  • Exponentially large and thus

hard to search How about Optimize in Continuous Space?

  • Compact and easy to optimize
  • Bring gradient (based
  • ptimization) back!
slide-14
SLIDE 14

Basic Methods

14

  • Use a string to indicate the architectures
  • Search based on the data (𝑦, 𝑧), where 𝑦 is arch string, 𝑧 is its valid

performance “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1”

slide-15
SLIDE 15

Neural Architecture Optimization (NAO)

15

01

  • Encodes the discrete string tokens 𝒚 to an embedding vector 𝒇𝒚 in

continuous space

Encoder - LSTM

02

  • Maps 𝒇𝒚 to its valid performance
  • Move towards the direction of gradients

Performance Predictor - FCN

03

  • Decoders the embedding vector 𝒇𝒚′ back to the discrete tokens 𝒚′

Decoder - LSTM

slide-16
SLIDE 16

Gradient-Based Search in Continuous Space

16

slide-17
SLIDE 17

Training & Inferencing

17

  • Train Encoder-Predictor-Decoder
  • Architecture pool of hundreds of (𝑦, 𝑧) pairs
  • Data augmentation:
  • symmetry architectures, swap two branches
  • i.e. “node1 conv 1x1 node2 conv 3x3” -> “node2 conv 3x3 node1 conv 1x1”
  • Encoder maps architecture 𝑦 into 𝒇𝒚
  • Performance-Predictor loss: squared error
  • 𝑴𝒒𝒒 = 𝒚∈𝒀(𝒕𝒚 − 𝒈(𝒇𝒚))𝟑
  • Decoder loss: reconstruction loss, nll loss
  • 𝑴𝒔𝒇𝒅 = 𝒚∈𝒀(− 𝐦𝐩𝐡𝒇 𝑸𝑬 𝒚 𝒇𝒚 )
  • Jointly train three components together
  • 𝑴 = 𝝁𝑴𝒒𝒒 + (𝟐 − 𝝁)𝑴𝒔𝒇𝒅
  • Generate new architectures:
  • Generate new architecture embedding with step size 𝜽: 𝒇𝒚′ = 𝒇𝒚 + 𝜽𝛂𝒇𝒚
  • Decoder maps 𝒇𝒚′ back into 𝒚′
  • Iterate: Train and evaluate new generated architectures and iterate over above steps
slide-18
SLIDE 18

h[i-1] h[i] conv 1x1 conv 3x3 max pool avg pool conv 1x1 conv 3x3

max

avg pool add conv 1x1 conv 3x3 max pool avg pool conv 1x1 conv 3x3 max pool avg pool conv 1x1

conv

max pool avg pool conv 1x1 conv 3x3 max pool avg pool add add concat

pool

3x3 Architecture 1: “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” Architecture 2: “node 1, conv 3x3, node 2, max pooling, node 2, conv 1x1, node 2, conv 1x1, node 1, conv 3x3, node 1, max pooling”

Weight Share

slide-19
SLIDE 19

Experiments and Results

04 04

slide-20
SLIDE 20

Task

20

Image Classification

Classify the images

CIFAR-10

10 classes 50000 images for training 10000 images for testing

CIFAR-100

100 classes 50000 images for training 10000 images for testing

Language Modeling

Modeling the probability distribution over sequences

  • f words in natural language

PTB

Penn Tree Bank

WT2

WikiText-2

slide-21
SLIDE 21

CIFAR-10

21

slide-22
SLIDE 22

Transfer to CIFAR-100

22

slide-23
SLIDE 23

PTB

23

slide-24
SLIDE 24

Transfer to WikiText-2

24

slide-25
SLIDE 25

Conclusion

05 05

slide-26
SLIDE 26

Conclusion

26

  • Encodes discrete description into continuous embedding
  • Performs the optimization within continuous space
  • Uses gradient based method rather than search discrete

decisions

New automatic architecture design algorithm

  • Paper Link: https://arxiv.org/abs/1808.07233
  • Code Link: https://github.com/renqianluo/NAO

Project Link

slide-27
SLIDE 27

QA

Thanks.