Best Practices, Pitfalls & Tricks An Inconvenient Truth Deep - PowerPoint PPT Presentation

xkcd Part 3: Best Practices, Pitfalls & Tricks

An Inconvenient Truth • Deep neural networks comprise millions of parameters: we don’t know what these parameters mean • Most of the time, we don’t know what a NN learns • NN are not suitable to gain “understanding” Neural networks are black boxes: treat them as such!

An Inconvenient Truth: Example Audi (82%) BMW (91%) Ferrari (79%)

An Inconvenient Truth: Example Ferrari (95%) Ferrari (79%)

An Inconvenient Truth: Example • Training data selection is critical • The NN “learns” your interpretation based on the training data, including observational/operator bias (NN are not unbiased!) • If all Ferraris in the training data are red, and all other cars are not red, then all red objects must be Ferraris!

An Inconvenient Truth • Machine Learning is mostly based on trial-and-error • There is no recipe for good performance, only guidelines • But: more theory is (slowly) being developed

Pitfalls 1. Bias and class imbalance in training set 2. Overfitting 3. Extrapolation beyond training data range 4. Improper weight initialisation 5. Excessive learning rates

Pitfalls: Class Imbalance Earthquake detection: 1.Noise 2.Earthquake Valentine & Trampert (2012)

Pitfalls: Class Imbalance Prediction: noise Prediction: noise Prediction: noise Prediction: noise Prediction: noise Prediction: noise Valentine & Trampert (2012) 99.9% accuracy!

Pitfalls: Overfitting Good generalisation Overfitting Depth Depth Pressure Pressure

Pitfalls: Extrapolation • Most NN architectures have a monotonic response • Beyond the data range the network confidence increases, whereas it should decrease! • Example: predicting large earthquakes based on small ones

Pitfalls: Extrapolation (Adversarials) https://openai.com/blog/adversarial-example-research/

Pitfalls: Extrapolation (Adversarials)

Pitfalls: Initialisation • Weights are initialised by sampling from a random distribution • If variance of every layer output < 1: vanishing gradients • If variance of every layer output > 1: exploding gradients • So Solut lutio ion: sample from random distribution with variance inversely proportional to layer input. This depends on the activation function! o ReLU : “He Normal initialisation” (He et al., 2015) o Sigmoid/tanh: “Xavier/ Glorot initialisation” ( Glorot & Bengio, 2010)

Pitfalls: Learning Rates Low learning rate High learning rate Loss Loss Parameter value Parameter value

Guidelines 1. Data representation and network architecture are most important 2. Bigger networks require more data = manual labour 3. Training data should be balanced, test data should be representative for real-world application 4. Training a NN is like turning a key in a lock: it only works if all components fall into place

Best Practices (1/2) 1. Start with a small network architecture 2. Before anything else, verify that training/test data is correct! 3. Try overfitting your data. If that doesn’t work, something is fundamentally wrong (e.g. initialisation) 4. Scale/shift the input data to have zero mean and a variance of around 1 (see basic MNIST tutorial) 5. Monitor train/test loss: if training loss decreases but test loss increases, the network is overfitting

Best Practices (2/2) 6. Monitor training process using TensorBoard. Make quantitative comparison between different “experiments” (architectures, hyperparameters, etc.) 7. Use Adam’s optimiser, ReLU activation (arguable) 8. Experiment with regularisation: batch normalisation, layer normalisation, dropout, noise layers (not covered today) 9. Be patient: if the network/dataset is large, training can take days on a decent GPU

Resources • YouTube o Lectures by Ian Goodfellow, Andrew Ng o Conference talks: e.g. NeurIPS (previously NIPS) • Udacity course (free): “ Intro to TensorFlow for Deep Learning ” • Competitions: Kaggle.com, DrivenData.org

Time to get really dirty…

Best Practices, Pitfalls & Tricks An Inconvenient Truth Deep - PowerPoint PPT Presentation

xkcd Part 3: Best Practices, Pitfalls & Tricks An Inconvenient Truth Deep neural networks comprise millions of parameters: we dont know what these parameters mean Most of the time, we dont know what a NN learns NN are not

4/2/2012 AN INCONVENIENT TRUTH Courtesy : Dr. A. Kader Bad cold chain practices at every level

Writing Speculative Fiction: Tips, Hints, Tricks, Pitfalls, Successes Dr. Sara L. Uckelman

SYMBOLIC LOGIC UNIT 3: COMPUTING TRUTH VALUES Truth Values The truth value of a

Truth, T Truth-values, and the l like Fabien Schang National Research University Higher

Paris After Trump: An Inconvenient Insight Christoph Bhringer University of Oldenburg and

Population Healthcare The power of variation www.england.nhs.uk Inconvenient truths The

Truth Revisited: What is Truth? Truth is Important. Pilate therefore said to Jesus: Art

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

The Agile PMP: Teaching an Old Dog New Tricks The Agile PMP: Teaching an Old Dog New Tricks

Best Practices: Electronics Cooling Ruben Bons - CD-adapco Best Practices Outline Geometry

Welcome to data visualization best practices in R Nick Strayer Instructor DataCamp

1 Best Practices Conversational UX Design 2 Best Practices Conversational UX Design SET THE

SPEAKING TRUTH SPEAKING TRUTH Ti e Impact of World Religions on Leadership for Social Change: C

Truth in Taxation County Auditors Regional Training Truth in Taxation Basics Texas

Sojourner Truth By Heath Colbert Lourie & Penelope Blatman Who Is Sojourner Truth Sojourner

Moving Beyond Linearity The truth is never linear! 1 / 23 Moving Beyond Linearity The truth is

Remarks on Game-Based Theories of Meaning Tero Tulenheimo CNRS STL / University of Lille 3

Matrix Element Method and its Application for ILC Physics Analysis Junping Tian, Keisuke Fujii

Notes on the computational aspects of Kripkes theory of truth Stanislav O. Speranski

Ma rc ia L . Zuc ke r, Ph.D. ZI VD L L C 1 Se t o f pro c e dure s de sig ne d to mo

Yen-Chu Chen Institute of

CSC2542 SAT-Based Planning Sheila McIlraith Department of Computer Science University of

Lab 5: 16 th April 2012 Exercises on Neural Networks 1. What are the values of weights w 0 , w 1 ,

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

Best Practices, Pitfalls & Tricks An Inconvenient Truth Deep - PowerPoint PPT Presentation

xkcd Part 3: Best Practices, Pitfalls & Tricks An Inconvenient Truth Deep neural networks comprise millions of parameters: we dont know what these parameters mean Most of the time, we dont know what a NN learns NN are not

4/2/2012 AN INCONVENIENT TRUTH Courtesy : Dr. A. Kader Bad cold chain practices at every level

Writing Speculative Fiction: Tips, Hints, Tricks, Pitfalls, Successes Dr. Sara L. Uckelman

SYMBOLIC LOGIC UNIT 3: COMPUTING TRUTH VALUES Truth Values The truth value of a

Truth, T Truth-values, and the l like Fabien Schang National Research University Higher

Paris After Trump: An Inconvenient Insight Christoph Bhringer University of Oldenburg and

Population Healthcare The power of variation www.england.nhs.uk Inconvenient truths The

Truth Revisited: What is Truth? Truth is Important. Pilate therefore said to Jesus: Art

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

The Agile PMP: Teaching an Old Dog New Tricks The Agile PMP: Teaching an Old Dog New Tricks

Best Practices: Electronics Cooling Ruben Bons - CD-adapco Best Practices Outline Geometry

Welcome to data visualization best practices in R Nick Strayer Instructor DataCamp

1 Best Practices Conversational UX Design 2 Best Practices Conversational UX Design SET THE

SPEAKING TRUTH SPEAKING TRUTH Ti e Impact of World Religions on Leadership for Social Change: C

Truth in Taxation County Auditors Regional Training Truth in Taxation Basics Texas

Sojourner Truth By Heath Colbert Lourie &amp; Penelope Blatman Who Is Sojourner Truth Sojourner

Moving Beyond Linearity The truth is never linear! 1 / 23 Moving Beyond Linearity The truth is

Remarks on Game-Based Theories of Meaning Tero Tulenheimo CNRS STL / University of Lille 3

Matrix Element Method and its Application for ILC Physics Analysis Junping Tian, Keisuke Fujii

Notes on the computational aspects of Kripkes theory of truth Stanislav O. Speranski

Ma rc ia L . Zuc ke r, Ph.D. ZI VD L L C 1 Se t o f pro c e dure s de sig ne d to mo

Yen-Chu Chen Institute of

CSC2542 SAT-Based Planning Sheila McIlraith Department of Computer Science University of

Lab 5: 16 th April 2012 Exercises on Neural Networks 1. What are the values of weights w 0 , w 1 ,

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

Sojourner Truth By Heath Colbert Lourie & Penelope Blatman Who Is Sojourner Truth Sojourner