Practical Methodology Lecture slides for Chapter 11 of Deep Learning - PowerPoint PPT Presentation

Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26

What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data? obscure algorithms? standard techniques? (2) (2) (2) h 1 h 2 h 3 (1) (1) (1) (1) h 1 h 2 h 3 h 4 v 1 v 2 v 3 (Goodfellow 2016)

Example: Street View Address Number Transcription (Goodfellow et al, 2014) (Goodfellow 2016)

Three Step Process • Use needs to define metric-based goals • Build an end-to-end system • Data-driven refinement (Goodfellow 2016)

Identify Needs • High accuracy or low accuracy? • Surgery robot: high accuracy • Celebrity look-a-like app: low accuracy (Goodfellow 2016)

Choose Metrics • Accuracy? (% of examples correct) • Coverage? (% of examples processed) • Precision? (% of detections that are right) • Recall? (% of objects detected) • Amount of error? (For regression problems) (Goodfellow 2016)

End-to-end System • Get up and running ASAP • Build the simplest viable system first • What baseline to start with though? • Copy state-of-the-art from related publication (Goodfellow 2016)

Deep or Not? • Lots of noise, little structure -> not deep • Little noise, complex structure -> deep • Good shallow baseline: • Use what you know • Logistic regression, SVM, boosted tree are all good (Goodfellow 2016)

Choosing Architecture Family • No structure -> fully connected • Spatial structure -> convolutional • Sequential structure -> recurrent (Goodfellow 2016)

Fully Connected Baseline • 2-3 hidden layer feed-forward neural network erceptron” • AKA “multilayer perceptron” Rectified linear units V • Rectified linear units W • Batch normalization • Adam • Maybe dropout (Goodfellow 2016)

Convolutional Network Baseline • Download a pretrained network • Or copy-paste an architecture from a related task olutional baseline • Or: • Deep residual network • Batch normalization • Adam (Goodfellow 2016)

Recurrent Network Baseline output × • LSTM output gate self-loop • SGD + × state forget gate • Gradient clipping × input gate • High forget gate bias input (Goodfellow 2016)

Data-driven Adaptation • Choose what to do based on data • Don’t believe hype • Measure train and test error • “Overfitting” versus “underfitting” (Goodfellow 2016)

High Train Error • Inspect data for defects • Inspect software for bugs • Don’t roll your own unless you know what you’re doing • Tune learning rate (and other optimization settings) • Make model bigger (Goodfellow 2016)

Checking Data for Defects • Can a human process it? 26624 (Goodfellow 2016)

Effect of Depth 92.5 Test accuracy (%) 96.5 96.0 95.5 95.0 94.5 94.0 93.5 93.0 92.0 Number of hidden layers 11 10 9 8 7 6 5 4 3 Increasing Depth (Goodfellow 2016)

High Test Error • Add dataset augmentation • Add dropout • Collect more data (Goodfellow 2016)

Optimal capacity (polynomial degree) 10 4 Test (optimal capacity) Train (optimal capacity) 10 0 10 1 10 2 10 3 10 5 Train (quadratic) # train examples 0 5 10 15 20 Test (quadratic) Bayes error 0 10 0 6 5 4 3 2 1 Error (MSE) # train examples 10 5 10 4 10 3 10 2 10 1 Increasing Training Set Size (Goodfellow 2016)

Tuning the Learning Rate 8 7 6 Training error 5 4 3 2 1 0 10 − 2 10 − 1 10 0 Learning rate (logarithmic scale) Figure 11.1 (Goodfellow 2016)

Reasoning about Hyperparameters Hyperparameter Increases Reason Caveats capacity when. . . Number of hid- increased Increasing the number of Increasing the number den units hidden units increases the of hidden units increases representational capacity both the time and memory of the model. cost of essentially every op- eration on the model. Table 11.1 (Goodfellow 2016)

Hyperparameter Search Grid Random Figure 11.2 (Goodfellow 2016)

Practical Methodology Lecture slides for Chapter 11 of Deep Learning - PowerPoint PPT Presentation

Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data? obscure

Practical Experience with Practical Experience with Practical Experience with Practical

Scaling Methodology Scaling Methodology Dan Smith Director HW Engineering dsmith@nvidia.com

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Methodology Methodology 3 age groups 2 7 years 8-12 years 13-17 years

SoC SoC Design Design Lecture 2: Design Methodology and Lecture Lecture 2: Design Methodology

Lecture 11: Practical Methodology Princeton University COS 495 Instructor: Yingyu Liang

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &

Listing Methodology Listing Methodology Aquatic Life q Prepared for the 303(d) Listing

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

Hardware Design with VHDL Register Transfer Methodology II ECE 443 Register Transfer Methodology:

The Air-Brake: A Practical Presentation of the Modern The Air-Brake: A Practical Presentation of

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

IU School of Education Fall Faculty Meeting October 16, 2015 10:00 a.m. - noon Agenda 1)

WU WUFAR AR 101 101 Wisconsin Uniform Financial Accounting Requirements Federal Funding

Java 9 and the impact on Maven Projects Robert Scholte (@rfscholte ) - current chairman Maven

Cyber Security and Export Controls: You Need to Know More Than You Already Do AnnaLisa

INFORMATION RISK MANAGEMENT PROGRAM Developing a Unit Risk Management Program Information

Chapter 4 State Feedback LQR Motivation Quadratic minimization : least squares Consider the

{ - + . ! wi xi 1 if > 0 - i =0 o = w n i =0 Output is a vector of

Management of Intestinal Malrotation in UCSF General Surgery Children vs. Adults Division of

Practical Methodology Lecture slides for Chapter 11 of Deep Learning - PowerPoint PPT Presentation

Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge Knowing how Mountains of dozens of to apply 3-4 of data? obscure

Practical Experience with Practical Experience with Practical Experience with Practical

Scaling Methodology Scaling Methodology Dan Smith Director HW Engineering dsmith@nvidia.com

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Methodology Methodology 3 age groups 2 7 years 8-12 years 13-17 years

SoC SoC Design Design Lecture 2: Design Methodology and Lecture Lecture 2: Design Methodology

Lecture 11: Practical Methodology Princeton University COS 495 Instructor: Yingyu Liang

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO &amp; ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &amp;

Listing Methodology Listing Methodology Aquatic Life q Prepared for the 303(d) Listing

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

Hardware Design with VHDL Register Transfer Methodology II ECE 443 Register Transfer Methodology:

The Air-Brake: A Practical Presentation of the Modern The Air-Brake: A Practical Presentation of

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

IU School of Education Fall Faculty Meeting October 16, 2015 10:00 a.m. - noon Agenda 1)

WU WUFAR AR 101 101 Wisconsin Uniform Financial Accounting Requirements Federal Funding

Java 9 and the impact on Maven Projects Robert Scholte (@rfscholte ) - current chairman Maven

Cyber Security and Export Controls: You Need to Know More Than You Already Do AnnaLisa

INFORMATION RISK MANAGEMENT PROGRAM Developing a Unit Risk Management Program Information

Chapter 4 State Feedback LQR Motivation Quadratic minimization : least squares Consider the

{ - + . ! wi xi 1 if &gt; 0 - i =0 o = w n i =0 Output is a vector of

Management of Intestinal Malrotation in UCSF General Surgery Children vs. Adults Division of

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &

{ - + . ! wi xi 1 if > 0 - i =0 o = w n i =0 Output is a vector of