An Introduction to Optimization Methods in Deep Learning 1 Yuan - PowerPoint PPT Presentation

An Introduction to Optimization Methods in Deep Learning 1 Yuan YAO HKUST

Acknowledgement ´ Feifei Li, Stanford cs231n ´ Ruder, Sebastian (2016). An overview of gradient descent optimization algorithms. arXiv:1609.04747. ´ http://ruder.io/deep-learning-optimization-2017/

Image Classification Example Dataset: Fashion MNIST Example Dataset: CIFAR10 28x28 grayscale images 60,000 training and 10,000 test examples 10 classes 10 classes 50,000 training images 10,000 testing images Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009. Jason WU, Peng XU, and Nayeon LEE Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 18 April 6, 2017

The Challenge of Human-Instructing- Computers The Problem : Semantic Gap What the computer sees An image is just a big grid of numbers between [0, 255]: e.g. 800 x 600 x 3 (3 channels RGB) This image by Nikita is licensed under CC-BY 2.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 7 April 6, 2017

Complex Invariance Challenges : Viewpoint variation Euclidean transform All pixels change when the camera moves! This image by Nikita is licensed under CC-BY 2.0 Challenges : Deformation Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 8 April 6, 2017 Large scale deformation This image by Tom Thai is This image by sare bear is This image by Umberto Salvagnin This image by Umberto Salvagnin licensed under CC-BY 2.0 licensed under CC-BY 2.0 is licensed under CC-BY 2.0 is licensed under CC-BY 2.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 10 April 6, 2017

Complex Invariance Challenges : Background Clutter Challenges : Illumination This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain Challenges : Occlusion Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 9 April 6, 2017 Challenges : Intraclass variation Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 12 April 6, 2017 This image is CC0 1.0 public domain This image by jonsson is licensed This image is CC0 1.0 public domain This image is CC0 1.0 public domain under CC-BY 2.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 13 April 6, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 11 April 6, 2017

Data Driven Learning of the invariants: linear discriminant/classification Recall from last time : Linear Classifier f(x,W) = Wx + b Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 7 April 11, 2017

(Empirical) Loss or Risk Function Suppose: 3 training examples, 3 classes. A loss function tells how With some W the scores are: good our current classifier is Given a dataset of examples Where is image and 3.2 1.3 2.2 cat is (integer) label 5.1 4.9 2.5 Loss over the dataset is a car sum of loss over examples: -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 10 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and Hing Loss where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat Suppose: 3 training examples, 3 classes. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: Multiclass SVM loss: 5.1 4.9 2.5 With some W the scores are: With some W the scores are: car Given an example Given an example where is the image and -1.7 2.0 -3.1 where is the image and frog where is the (integer) label, where is the (integer) label, and using the shorthand for the and using the shorthand for the scores vector: scores vector: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 12 April 11, 2017 the SVM loss has the form: 3.2 1.3 2.2 cat the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Loss over full dataset is average: 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog -1.7 2.0 -3.1 frog L = (2.9 + 0 + 12.9)/3 2.9 0 12.9 Losses: = 5.27 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 17 April 11, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 11 April 11, 2017

Cross Entropy (Negative Log-likelihood) Loss Softmax Classifier (Multinomial Logistic Regression) unnormalized probabilities 3.2 24.5 0.13 cat L_i = -log(0.13) = 0.89 normalize exp 5.1 164.0 0.87 car -1.7 0.18 0.00 frog unnormalized log probabilities probabilities Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 46 April 11, 2017

Loss + Regularization Data loss : Model predictions Regularization : Model should match training data should be “simple”, so it works on test data Occam’s Razor : “Among competing hypotheses, the simplest is the best” William of Ockham, 1285 - 1347 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 33 April 11, 2017

Regularizations = regularization strength Regularization (hyperparameter) In common use: ´ Explicit regularization L2 regularization ´ L2-regularization L1 regularization ´ L1-regularization (Lasso) Elastic net (L1 + L2) ´ Elastic-net (L1+L2) ´ Max-norm regularization Max norm regularization (might see later) ´ Implicit regularization Dropout (will see later) ´ Dropout Fancier: Batch normalization, stochastic depth ´ Batch-normalization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 34 April 11, 2017 ´ Earlystopping

An Introduction to Optimization Methods in Deep Learning 1 Yuan - PowerPoint PPT Presentation

An Introduction to Optimization Methods in Deep Learning 1 Yuan YAO HKUST Acknowledgement Feifei Li, Stanford cs231n Ruder, Sebastian (2016). An overview of gradient descent optimization algorithms. arXiv:1609.04747.

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Optimization for Training Deep Models Xiaogang Wang xgwang@ee.cuhk.edu.hk February 12, 2019

BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

No Free Lunch in Cyber Security George Cybenko gvc@dartmouth.edu Jeff Hughes

No Free Lunch David Cervini, Danica Porobic , Pnar Tzn, Anastasia Ailamaki Why Hardware

SCHOOL RE-ENTRY AT A GLANCE Health and Safety Plans COVID-19 CONSIDERATIONS COVID-19 RE-ENTRY

Welcome Santa Clara, California | April 23rd 25th, 2018 Be Social! Download the App! App

Spam Fighting at CERN 28 April 2004 Emmanuel Ormancey 1 What is Spam ? What is Spam ? Spam

Admin Matters School Curriculum Time for P1 Monday to Friday 7.30 am to 1.30 pm 7.25 am

Landmines & Zombies Taking on Chronic Fa6gue 2 Bachelor of Engineering & Bachelor of

Tech Check Upon entry into the room, please MUTE your audio and start your video Welcome!

Sambuz

Useful Links

Newsletter

Mail Us

An Introduction to Optimization Methods in Deep Learning 1 Yuan - PowerPoint PPT Presentation

An Introduction to Optimization Methods in Deep Learning 1 Yuan YAO HKUST Acknowledgement Feifei Li, Stanford cs231n Ruder, Sebastian (2016). An overview of gradient descent optimization algorithms. arXiv:1609.04747.

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Optimization for Training Deep Models Xiaogang Wang xgwang@ee.cuhk.edu.hk February 12, 2019

BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

No Free Lunch in Cyber Security George Cybenko gvc@dartmouth.edu Jeff Hughes

No Free Lunch David Cervini, Danica Porobic , Pnar Tzn, Anastasia Ailamaki Why Hardware

SCHOOL RE-ENTRY AT A GLANCE Health and Safety Plans COVID-19 CONSIDERATIONS COVID-19 RE-ENTRY

Welcome Santa Clara, California | April 23rd 25th, 2018 Be Social! Download the App! App

Spam Fighting at CERN 28 April 2004 Emmanuel Ormancey 1 What is Spam ? What is Spam ? Spam

Admin Matters School Curriculum Time for P1 Monday to Friday 7.30 am to 1.30 pm 7.25 am

Landmines &amp; Zombies Taking on Chronic Fa6gue 2 Bachelor of Engineering &amp; Bachelor of

Tech Check Upon entry into the room, please MUTE your audio and start your video Welcome!

Sambuz

Useful Links

Newsletter

Mail Us

Landmines & Zombies Taking on Chronic Fa6gue 2 Bachelor of Engineering & Bachelor of