neural architecture optimization
play

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS - PowerPoint PPT Presentation

Neural Architecture Optimization CONTENTS 1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion 01 01 AutoML Auto Machine Learning Typical Machine Learning Fixed data order Fixed model space Fixed


  1. Neural Architecture Optimization 神经网络结构优化 赵鉴

  2. CONTENTS 1.AutoML 2.NAS 3.NAO 4.Experiments 5.Conclusion

  3. 01 01 AutoML Auto Machine Learning

  4. Typical Machine Learning • Fixed data order • Fixed model space • Fixed loss function 4

  5. Auto Machine Learning • Auto data selection/process • Auto model selection and training • Auto hyper parameter tuning 5

  6. 02 02 NAS Neural Architecture Search

  7. Architecture of a Neural Network is Crucial to its Performance ImageNet Winning Neural Architectures AlexNet 2012 Inception 2014 ResNet 2015 ZFNet 2013 7

  8. NAS Neural Architecture Search Automatic Not many human Given Dataset efforts i.e., CIFAR-10, CIFAR-100 PTB, WikiText-2 Output … Network architecture that fits given dataset Target Task on the target task Goal i.e., image classification, well language modeling, Alleviate the pain of … human efforts 8

  9. General Framework Generate Architectures Child Controller Network Train and Get Valid Performance 9

  10. Typical Search Methods/Algorithms • Reinforcement Learning • Evolutionary Computing • Take each architecture choice • Changing the architecture as (i.e., sub-architecture) as mutation and selection action • Take the valid performance as • Take valid performance as fitness reward • Evolve the architectures • Use policy gradient to search the best action • AmoebaNet • NAS-RL (Google, 2017) • … • NASNet (Google, 2017) • ENAS (CMU & Google, 2018) • … 10

  11. Results of Previous NAS Works • In terms of building • In terms of pushing SOTA results products with AutoML • On ImageNet • Microsoft, Google, … • Startups focus on AutoML 11

  12. 03 03 Neural Architecture Optimization Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu NIPS 2018

  13. Are Previous NAS Works Perfect Enough? Why Search in Discrete How about Optimize in space? Continuous Space? • Exponentially large and thus • Compact and easy to optimize hard to search • Bring gradient (based optimization) back! 13

  14. Basic Methods • Use a string to indicate the architectures • Search based on the data (𝑦, 𝑧) , where 𝑦 is arch string, 𝑧 is its valid performance “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” 14

  15. Neural Architecture Optimization (NAO) Encoder - LSTM 01 • Encodes the discrete string tokens 𝒚 to an embedding vector 𝒇 𝒚 in continuous space Performance Predictor - FCN 02 • Maps 𝒇 𝒚 to its valid performance • Move towards the direction of gradients Decoder - LSTM 03 • Decoders the embedding vector 𝒇 𝒚′ back to the discrete tokens 𝒚′ 15

  16. Gradient-Based Search in Continuous Space 16

  17. Training & Inferencing • Train Encoder-Predictor-Decoder • Architecture pool of hundreds of (𝑦, 𝑧) pairs • Data augmentation: • symmetry architectures, swap two branches • i.e. “node1 conv 1x1 node2 conv 3x3” - > “node2 conv 3x3 node1 conv 1x1” • Encoder maps architecture 𝑦 into 𝒇 𝒚 • Performance-Predictor loss: squared error • 𝑴 𝒒𝒒 = 𝒚∈𝒀 (𝒕 𝒚 − 𝒈(𝒇 𝒚 )) 𝟑 • Decoder loss: reconstruction loss, nll loss • 𝑴 𝒔𝒇𝒅 = 𝒚∈𝒀 (− 𝐦𝐩𝐡 𝒇 𝑸 𝑬 𝒚 𝒇 𝒚 ) • Jointly train three components together • 𝑴 = 𝝁𝑴 𝒒𝒒 + (𝟐 − 𝝁)𝑴 𝒔𝒇𝒅 • Generate new architectures: • Generate new architecture embedding with step size 𝜽: 𝒇 𝒚′ = 𝒇 𝒚 + 𝜽𝛂𝒇 𝒚 • Decoder maps 𝒇 𝒚′ back into 𝒚 ′ 17 • Iterate: Train and evaluate new generated architectures and iterate over above steps

  18. h[i-1] h[i] Weight Share conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 1x1 conv 3x3 conv 3x3 conv 3x3 conv 3x3 conv 3x3 conv 3x3 max pool pool max pool max pool max max pool max pool avg pool avg pool avg pool avg pool avg pool avg pool add add add concat Architecture 1: “node 2, conv 1x1, node 1, max pooling, node 1, max pooling, node 1, conv 3x3, node 2, conv 3x3, node 2, conv 1x1” Architecture 2: “node 1, conv 3x3, node 2, max pooling, node 2, conv 1x1, node 2, conv 1x1, node 1, conv 3x3, node 1, max poo lin g”

  19. 04 04 Experiments and Results

  20. Task Language Modeling Image Classification Modeling the probability Classify the images distribution over sequences of words in natural language CIFAR-10 PTB 10 classes Penn Tree Bank 50000 images for training 10000 images for testing CIFAR-100 WT2 100 classes WikiText-2 50000 images for training 10000 images for testing 20

  21. CIFAR-10 21

  22. Transfer to CIFAR-100 22

  23. PTB 23

  24. Transfer to WikiText-2 24

  25. 05 05 Conclusion

  26. Conclusion New automatic architecture design algorithm • Encodes discrete description into continuous embedding • Performs the optimization within continuous space • Uses gradient based method rather than search discrete decisions Project Link • Paper Link: https://arxiv.org/abs/1808.07233 • Code Link: https://github.com/renqianluo/NAO 26

  27. Thanks . QA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend