Practical Methodology for Deploying Machine Learning Ian Goodfellow - - PowerPoint PPT Presentation

practical methodology for deploying machine learning
SMART_READER_LITE
LIVE PREVIEW

Practical Methodology for Deploying Machine Learning Ian Goodfellow - - PowerPoint PPT Presentation

Practical Methodology for Deploying Machine Learning Ian Goodfellow (An homage to Advice for Applying Machine Learning by Andrew Ng) What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data?


slide-1
SLIDE 1

Practical Methodology for Deploying Machine Learning

Ian Goodfellow

(An homage to “Advice for Applying Machine Learning” by Andrew Ng)

slide-2
SLIDE 2

What drives success in ML?

Arcane knowledge

  • f dozens of
  • bscure algorithms?

Mountains

  • f data?

Knowing how to apply 3-4 standard techniques?

h1

(1)

h2

(1)

h3

(1)

v1 v2 v3 h1

(2)

h2

(2)

h3

(2)

h4

(1)

slide-3
SLIDE 3

Street View Transcription

(Goodfellow et al, 2014)

slide-4
SLIDE 4

3 Step Process

  • Use needs to define metric-based goals
  • Build an end-to-end system
  • Data-driven refinement
slide-5
SLIDE 5

Identify needs

  • High accuracy or low accuracy?
  • Surgery robot: high accuracy
  • Celebrity look-a-like app: low accuracy
slide-6
SLIDE 6

Choose Metrics

  • Accuracy? (% of examples correct)
  • Coverage? (% of examples processed)
  • Precision? (% of detections that are right)
  • Recall? (% of objects detected)
  • Amount of error? (For regression problems)
slide-7
SLIDE 7

End-to-end system

  • Get up and running ASAP
  • Build the simplest viable system first
  • What baseline to start with though?
  • Copy state-of-the-art from related publication
slide-8
SLIDE 8

Deep or not?

  • Lots of noise, little structure -> not deep
  • Little noise, complex structure -> deep
  • Good shallow baseline:
  • Use what you know
  • Logistic regression, SVM, boosted tree are all

good

slide-9
SLIDE 9

What kind of deep?

  • No structure -> fully connected
  • Spatial structure -> convolutional
  • Sequential structure -> recurrent
slide-10
SLIDE 10

Fully connected baseline

  • 2-3 hidden layer feedforward network
  • AKA “multilayer perceptron”
  • Rectified linear units
  • Dropout
  • SGD + momentum

V W

slide-11
SLIDE 11

Convolutional baseline

  • Inception
  • Batch normalization
  • Fallback option:
  • Rectified linear convolutional net
  • Dropout
  • SGD + momentum
slide-12
SLIDE 12

Recurrent baseline

  • LSTM
  • SGD
  • Gradient clipping
  • High forget gate bias

×

input input gate forget gate

  • utput gate
  • utput

state self-loop

× + ×

slide-13
SLIDE 13

Data driven adaptation

  • Choose what to do based on data
  • Don’t believe hype
  • Measure train and test error
  • “Overfitting” versus “underfitting”
slide-14
SLIDE 14

High train error

  • Inspect data for defects
  • Inspect software for bugs
  • Don’t roll your own unless you know what you’re

doing

  • Tune learning rate (and other optimization settings)
  • Make model bigger
slide-15
SLIDE 15

Checking data for defects

  • Can a human process it?

26624

slide-16
SLIDE 16

Increasing depth

3 4 5 6 7 8 9 10 11 Number of hidden layers 92.0 92.5 93.0 93.5 94.0 94.5 95.0 95.5 96.0 96.5 Test accuracy (%)

Effect of Depth

slide-17
SLIDE 17

High test error

  • Add dataset augmentation
  • Add dropout
  • Collect more data
slide-18
SLIDE 18

Increasing training set size

100 101 102 103 104 105 # train examples 1 2 3 4 5 6 Error (MSE)

Bayes error Train (quadratic) Test (quadratic) Test (optimal capacity) Train (optimal capacity)

100 101 102 103 104 105 # train examples 5 10 15 20 Optimal capacity (polynomial degree)

slide-19
SLIDE 19

Deep Learning textbook

Yoshua Bengio Ian Goodfellow Aaron Courville goodfeli.github.io/dlbook