Learning Transferable Architectures for Scalable Image Recognition - PowerPoint PPT Presentation

Learning Transferable Architectures for Scalable Image Recognition Zoph et al.

Introduction • Architecture engineering • Finding the architectures of the machine learning model • Neural Architecture Search (NAS) Framework • Automate architecture engineering using reinforcement learning search method • Problem: Computationally expensive on large datasets like ImageNet • Approach of this paper • Searching architecture on a smaller dataset like CIFAR-10 • Apply the learned architecture to bigger dataset

Datasets • CIFAR-10 • 60,000 32x32 RGB images across 10 classes • 50,000 train and 10,000 test images • Using 5,000 random selected images from the training set as a validation set • Whitened and randomly crop 32x32 patches from upsampled images of size 40x40 and apply random horizontal flips • ImageNet • 14 million images • Resize to 299x299 or 331x331 resolution in this research

NASNet Search Space • Allowing the transferability of the architecture • The complexity of the architecture is not related to the depth of the network and the size of input images • Repeated motifs in different architecture engineering with CNNs • Convolutional cells with identical structure but different weights • Expressing the repeated motifs • Compose the convolutional nets

Neural Architecture Search (NAS) Framework

Convolutional Cells • Normal Cell • Return a feature map of the same dimension • Reduction Cell • Return a feature map where the feature map height and width is reduced by a factor of two

Searching the Structures of the Cells • The controller repeats the following 5 steps B times corresponding to the B blocks in a convolutional cell 1. Select a hidden state from hi, hi−1 or from the set of hidden states created in previous blocks. 2. Select a second hidden state from the same options as in Step 1. 3. Select an operation to apply to the hidden state selected in Step 1. 4. Select an operation to apply to the hidden state selected in Step 2. 5. Select a method to combine the outputs of Step 3 and 4 to create a new hidden state. • To predict both Normal Cell and Reduction Cell, do 2 x 5B predictions in total

Searching the Structures of the Cells

Searching the Structures of the Cells • Possible operations in step 3 and 4 • Identity • 1x3 then 3x1 convolution • 1x7 then 7x1 convolution • 3x3 dilated convolution • 3x3 average pooling • 3x3 max pooling • 5x5 max pooling • 7x7 max pooling • 1x1 convolution • 3x3 convolution • 3x3 depthwise-separable conv • 5x5 depthwise-seperable conv • 7x7 depthwise-separable conv

Top Performing Cells

Building network for a certain task • Convolutional cells • Number of cell repeats N • Number of filters in the initial convolutional cell

ScheduledDropPath • DropPath • Each path in the cell is stochastically dropped with some fixed probability during training • Dose not work well for NASNets • ScheduledDropPath • Each path in the cell is dropped out with a probability that is linearly increased over the course of training • Significantly improves the final performance

Result - CIFAR-10 Image Classification

Result - ImageNet Image Classification

Result

Result - Object Detection

Deep Speech: Scaling up end-to-end speech recognition Hannun et al.

Introduction • Traditional speech systems • Human engineered processing pipelines • Great amount of efforts needed • Weak in noisy environment • Deep Speech • Applying deep learning end-to-end using RNN • No need of human designed components • Performance better on noisy speech recognition than traditional speech systems

RNN Training • Input • Training set X = {(x (1) , y (1) ),(x (2) , y (2) ), . . .}, which x is a single utterance and y is label • x (i) , is a time-series of length T (i) where every time-slice is a vector of audio features • x (i) t,p denotes the power of the p’th frequency bin in the audio frame at time t • Output • Sequence of character probabilities for the transcription y, with y ˆ t = P(c t |x), where c t ∈ {a,b,c, . . . , z,space, apostrophe, blank}

RNN Training • 5 layers of hidden units, the hidden units at layer l are denoted h (l) • First three layers • Non-recurrent • h (l) t = g(W (l) h (l−1) t + b (l) ) • g(z) = min{max{0, z}, 20} is the clipped rectified-linear (ReLu) activation function • W (l) is the weight matrix • b (l) is the bias parameters • First layer depends on the spectrogram frame x t along with a context of C frames on each side • Other layers operate on independent data for each time step

RNN Training • The fourth layer • Bi-directional recurrent layer • Two sets of hidden units • Forward recurrence • h (f) t = g(W (4) h (3) t + W (f) r h (f) t−1 + b (4) ) • Computed sequentially from t = 1 to t = T (i) • Backward recurrence • h (b) t = g(W (4) h (3) t + W (b) r h (b) t+1 + b (4) ) • Computed sequentially in reverse from t = T (i) to t = 1 • The fifth layer • h (5) t = g(W (5) h (4) t + b (5) ) • h (4) t = h (f) t + h (b) t

RNN Training • Output layer • Standard softmax functionthat yields the predicted character probabilities for each time slice t and character k in the alphabet • Only single recurrent layer • hardest to parallelize • Do not use Long-Short-Term-Memory circuits • Approaches avoid overfitting • Dropout the feedforward layers in a rate between 5% - 10% • Jittering the input • Applying language model to reduce error • Maximize Q(c) = log(P(c|x)) + α log(P lm (c)) + β word count(c)

Optimizations • Data parallelism • Each GPU processes many examples in parallel • Large minibatches that a single GPU cannot support • Each GPU processing a separate minibatch of examples and then combining its computed gradient with its peers during each iteration • Model parallelism • Perform the computations of h (f) and h (b) in parallel • Problem: Time consuming on data transfers when calculate the fifth layer • One GPU for each half of the time-series • One compute h (f) first, another compute h (b) , exchange at mid-point • Striding • Shorten the recurrent layers by taking strides of size 2 in the original input

Training Data • 5000 hours of read speech from 9600 speakers • Synthesize noisy training data • Using many short clips of noise sound • Rejecting noise clips where the average power in each frequency band differed significantly from the average power of real noisy recordings • Lombard Effect • Speakers actively change the pitch or inflections of their voice to overcome noise around them • Playing loud background noise through headphones worn by a person as they record an utterance

Performance

Testing Noisy Speech Performance • Few standards exist • Evaluation set of 100 noisy and 100 noise-free utterances from 10 speakers • Noise environments • Background radio or TV; washing dishes in a sink; a crowded cafeteria; a restaurant; and inside a car driving in the rain • Utterance text • Primarily from web search queries and text messages • Signal-to-noise ratio between 2 and 6 dB

Performance

Thank you!

Learning Transferable Architectures for Scalable Image Recognition - PowerPoint PPT Presentation

Learning Transferable Architectures for Scalable Image Recognition Zoph et al. Introduction Architecture engineering Finding the architectures of the machine learning model Neural Architecture Search (NAS) Framework Automate

Transferable Utility Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course:

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

Learning Transferable Architectures for Scalable Image Recognition Barret Zoph, Vijay Vasudevan,

Identifying and Showcasing Your Transferable Skills Maggie Evans, Ph.D. July 12, 2018 Learning

BEYOND FLUX BEYOND FLUX SCALABLE FRONTEND ARCHITECTURES SCALABLE FRONTEND ARCHITECTURES USING

Architectures Architectural styles Software architectures Architectures versus middleware

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

CVPR 2020 Universal Adversarial Attacks Image agnostic and transferable across networks

TRANSFERABLE SKILLS A PRESENTATION TO THE NATIONAL BLACK MBA ASSOCIATION, INC. ATLANTA CHAPTER

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Contracting Procedures & Forms Below and Above the SAT JTF-Role Play Scenarios (Contract

Generalized tilting theory in functor categories Xi Tang April 25, 2019 logo Table of content

Evaluating ProtoDUNE Single Phase Detector Response with a Cosmic Ray Tagger (CRT) Richie Diurba

Update on Proton Calorimetric Reconstruction Heng-Ye Liao ProtoDUNE sim/reco meeting Dec 11,

Interactive Configuration of High Performance Renovation of Apartment Buildings by the use of

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU),

Learning Data Transforma0on Rules through Examples: Preliminary

FRAMEWORK REVIEW PUBLIC WORKSHOP: NETWORKS INCENTIVES 6 MARCH 2019 MELBOURNE WELCOME AND

Learning Transferable Architectures for Scalable Image Recognition - PowerPoint PPT Presentation

Learning Transferable Architectures for Scalable Image Recognition Zoph et al. Introduction Architecture engineering Finding the architectures of the machine learning model Neural Architecture Search (NAS) Framework Automate

Transferable Utility Game Theory Course: Jackson, Leyton-Brown &amp; Shoham Game Theory Course:

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

Learning Transferable Architectures for Scalable Image Recognition Barret Zoph, Vijay Vasudevan,

Identifying and Showcasing Your Transferable Skills Maggie Evans, Ph.D. July 12, 2018 Learning

BEYOND FLUX BEYOND FLUX SCALABLE FRONTEND ARCHITECTURES SCALABLE FRONTEND ARCHITECTURES USING

Architectures Architectural styles Software architectures Architectures versus middleware

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

CVPR 2020 Universal Adversarial Attacks Image agnostic and transferable across networks

TRANSFERABLE SKILLS A PRESENTATION TO THE NATIONAL BLACK MBA ASSOCIATION, INC. ATLANTA CHAPTER

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

learning: defense, transferable and camouflaged attacks Xingjun Ma School of Computing and

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Contracting Procedures &amp; Forms Below and Above the SAT JTF-Role Play Scenarios (Contract

Generalized tilting theory in functor categories Xi Tang April 25, 2019 logo Table of content

Evaluating ProtoDUNE Single Phase Detector Response with a Cosmic Ray Tagger (CRT) Richie Diurba

Update on Proton Calorimetric Reconstruction Heng-Ye Liao ProtoDUNE sim/reco meeting Dec 11,

Interactive Configuration of High Performance Renovation of Apartment Buildings by the use of

I mproving a ccuracy of SMS based FAQ r etrieval From: Delhi T echnological University (DTU),

Learning Data Transforma0on Rules through Examples: Preliminary

FRAMEWORK REVIEW PUBLIC WORKSHOP: NETWORKS INCENTIVES 6 MARCH 2019 MELBOURNE WELCOME AND

Transferable Utility Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course:

Contracting Procedures & Forms Below and Above the SAT JTF-Role Play Scenarios (Contract