Lecture 4: Optimization Justin Johnson Lecture 4 - 1 September - PowerPoint PPT Presentation

Lecture 4: Optimization Justin Johnson Lecture 4 - 1 September 16, 2019

Waitlist Update We will open the course for enrollment later today / tomorrow Justin Johnson Lecture 4 - 2 September 16, 2019

Reminder: Assignment 1 Was due yesterday! (But you do have late days…) Justin Johnson Lecture 4 - 3 September 16, 2019

Assignment 2 • Will be released today • Use SGD to train linear classifiers and fully-connected networks • After today, can do linear classifiers section • After Wednesday, can do fully-connected networks • If you have a hard time computing derivatives, wait for next Monday’s lecture on backprop • Due Monday September 30, 11:59pm (two weeks from today) Justin Johnson Lecture 4 - 4 September 16, 2019

Course Update - A1: 10% - A2: 10% - A3: 10% - A4: 10% - A5: 10% - A6: 10% - Midterm: 20% - Final: 20% Justin Johnson Lecture 4 - 5 September 16, 2019

Course Update: No Final Exam - A1: 10% - A1: 10% - A2: 10% - A2: 13% - A3: 10% - A3: 13% - A4: 10% - A4: 13% Expect A5 and A6 - A5: 10% - A5: 13% to be longer than other homework - A6: 10% - A6: 13% - Midterm: 20% - Midterm: 25% - Final: 20% - Final Justin Johnson Lecture 4 - 6 September 16, 2019

Last Time: Linear Classifiers Visual Viewpoint Algebraic Viewpoint Geometric Viewpoint One template Hyperplanes f(x,W) = Wx per class cutting up space Justin Johnson Lecture 4 - 7 September 16, 2019

Last Time: Loss Functions quantify preferences We have some dataset of (x, y) - We have a score function: - We have a loss function : - Linear classifier Softmax SVM Full loss Justin Johnson Lecture 4 - 8 September 16, 2019

Last Time: Loss Functions quantify preferences Q : How do we find the best W? We have some dataset of (x, y) - We have a score function: - We have a loss function : - Linear classifier Softmax SVM Full loss Justin Johnson Lecture 4 - 9 September 16, 2019

Optimization Justin Johnson Lecture 4 - 10 September 16, 2019

This image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain Justin Johnson Lecture 4 - 11 September 16, 2019

This image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain Justin Johnson Lecture 4 - 12 September 16, 2019

Idea #1: Random Search ch (bad idea!) Justin Johnson Lecture 4 - 13 September 16, 2019

Idea #1: Random Search ch (bad idea!) 15.5% accuracy! not bad! Justin Johnson Lecture 4 - 14 September 16, 2019

Idea #1: Random Search ch (bad idea!) 15.5% accuracy! not bad! (SOTA is ~95%) Justin Johnson Lecture 4 - 15 September 16, 2019

Idea #2: Fo Follow the slope Justin Johnson Lecture 4 - 16 September 16, 2019

Idea #2 : : Follow th the e slope In 1-dimension, the derivative of a function gives the slope: Justin Johnson Lecture 4 - 17 September 16, 2019

Idea #2 : : Follow th the e slope In 1-dimension, the derivative of a function gives the slope: In multiple dimensions, the gradient is the vector of (partial derivatives) along each dimension The slope in any direction is the dot product of the direction with the gradient The direction of steepest descent is the negative gradient Justin Johnson Lecture 4 - 18 September 16, 2019

current W: gradient dL/dW: [0.34, [?, -1.11, ?, 0.78, ?, 0.12, ?, 0.55, ?, 2.81, ?, -3.1, ?, -1.5, ?, 0.33,…] ?,…] loss 1.25347 Justin Johnson Lecture 4 - 19 September 16, 2019

current W: W + h (first dim) : gradient dL/dW: [0.34, [0.34 + 0.0001 , [?, -1.11, -1.11, ?, 0.78, 0.78, ?, 0.12, 0.12, ?, 0.55, 0.55, ?, 2.81, 2.81, ?, -3.1, -3.1, ?, -1.5, -1.5, ?, 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25322 Justin Johnson Lecture 4 - 20 September 16, 2019

current W: W + h (first dim) : gradient dL/dW: [0.34, [0.34 + 0.0001 , [ -2.5 , -1.11, -1.11, ?, 0.78, 0.78, ?, 0.12, 0.12, ?, (1.25322 - 1.25347)/0.0001 0.55, 0.55, ?, = -2.5 2.81, 2.81, ?, -3.1, -3.1, ?, -1.5, -1.5, ?, 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25322 Justin Johnson Lecture 4 - 21 September 16, 2019

current W: W + h (second dim) : gradient dL/dW: [0.34, [0.34, [-2.5, -1.11, -1.11 + 0.0001 , ?, 0.78, 0.78, ?, 0.12, 0.12, ?, 0.55, 0.55, ?, 2.81, 2.81, ?, -3.1, -3.1, ?, -1.5, -1.5, ?, 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25353 Justin Johnson Lecture 4 - 22 September 16, 2019

current W: W + h (second dim) : gradient dL/dW: [0.34, [0.34, [-2.5, -1.11, -1.11 + 0.0001 , 0.6 , 0.78, 0.78, ?, 0.12, 0.12, ?, 0.55, 0.55, ?, (1.25353 - 1.25347)/0.0001 2.81, 2.81, ?, = 0.6 -3.1, -3.1, ?, -1.5, -1.5, ?, 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25353 Justin Johnson Lecture 4 - 23 September 16, 2019

current W: W + h (third dim) : gradient dL/dW: [0.34, [0.34, [-2.5, -1.11, -1.11, 0.6, 0.78, 0.78 + 0.0001 , ?, 0.12, 0.12, ?, 0.55, 0.55, ?, 2.81, 2.81, ?, -3.1, -3.1, ?, -1.5, -1.5, ?, 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25347 Justin Johnson Lecture 4 - 24 September 16, 2019

current W: W + h (third dim) : gradient dL/dW: [0.34, [0.34, [-2.5, -1.11, -1.11, 0.6, 0.78, 0.78 + 0.0001 , 0.0 , 0.12, 0.12, ?, 0.55, 0.55, ?, 2.81, 2.81, ?, (1.25347 - 1.25347)/0.0001 -3.1, -3.1, ?, = 0.0 -1.5, -1.5, ?, 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25347 Justin Johnson Lecture 4 - 25 September 16, 2019

current W: W + h (third dim) : gradient dL/dW: [0.34, [0.34, [-2.5, -1.11, -1.11, 0.6, 0.78, 0.78 + 0.0001 , 0.0 , 0.12, 0.12, ?, 0.55, 0.55, ?, 2.81, 2.81, ?, -3.1, -3.1, Numeric Gradient : ?, - Slow: O(#dimensions) -1.5, -1.5, ?, - Approximate 0.33,…] 0.33,…] ?,…] loss 1.25347 loss 1.25347 Justin Johnson Lecture 4 - 26 September 16, 2019

Loss is a function of W want Justin Johnson Lecture 4 - 27 September 16, 2019

Loss is a function of W: Analytic Gradient want Use calculus to compute an analytic gradient This image is in the public domain This image is in the public domain Justin Johnson Lecture 4 - 28 September 16, 2019

current W: gradient dL/dW: [0.34, [-2.5, dL/dW = ... -1.11, 0.6, (some function 0.78, 0, data and W) 0.12, 0.2, 0.55, 0.7, 2.81, -0.5, -3.1, 1.1, -1.5, 1.3, 0.33,…] -2.1,…] loss 1.25347 Justin Johnson Lecture 4 - 29 September 16, 2019

current W: gradient dL/dW: [0.34, [-2.5, dL/dW = ... -1.11, 0.6, (some function 0.78, 0, data and W) 0.12, 0.2, 0.55, 0.7, 2.81, -0.5, -3.1, (In practice we will compute 1.1, dL/dW using backpropagation; -1.5, 1.3, see Lecture 6) 0.33,…] -2.1,…] loss 1.25347 Justin Johnson Lecture 4 - 30 September 16, 2019

Computing Gradients Numeric gradient : approximate, slow, easy to write - Analytic gradient : exact, fast, error-prone - In practice: Always use analytic gradient, but check implementation with numerical gradient. This is called a gradient check. Justin Johnson Lecture 4 - 31 September 16, 2019

Gradient Descent Iteratively step in the direction of the negative gradient (direction of local steepest descent) Hyperparameters : - Weight initialization method - Number of steps - Learning rate Justin Johnson Lecture 4 - 35 September 16, 2019

negative Gradient Descent gradient original W direction Iteratively step in the direction of W_2 the negative gradient (direction of local steepest descent) Hyperparameters : - Weight initialization method - Number of steps - Learning rate W_1 Justin Johnson Lecture 4 - 36 September 16, 2019

Gradient Descent Iteratively step in the direction of the negative gradient (direction of local steepest descent) Hyperparameters : - Weight initialization method - Number of steps - Learning rate Justin Johnson Lecture 4 - 37 September 16, 2019

Batch ch Gradient Desce cent Full sum expensive when N is large! Justin Johnson Lecture 4 - 38 September 16, 2019

cent (SGD) Stoch chastic c Gradient Desce Full sum expensive when N is large! Approximate sum using a minibatch of examples 32 / 64 / 128 common Hyperparameters : - Weight initialization - Number of steps - Learning rate - Batch size - Data sampling Justin Johnson Lecture 4 - 39 September 16, 2019

Lecture 4: Optimization Justin Johnson Lecture 4 - 1 September - PowerPoint PPT Presentation

Lecture 4: Optimization Justin Johnson Lecture 4 - 1 September 16, 2019 Waitlist Update We will open the course for enrollment later today / tomorrow Justin Johnson Lecture 4 - 2 September 16, 2019 Reminder: Assignment 1 Was due

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

AM 205: lecture 20 Today: PDE optimization, constrained optimization example New topic:

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

CSC321 Lecture 7: Optimization Roger Grosse Roger Grosse CSC321 Lecture 7: Optimization 1 / 25

CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

St Stress Aware Layout Stress Aware Layout St A A L L t t Optimization Optimization

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

Optimization Process Done by an Optimization Algorithm Jose Rueda Torres Learning Objectives

Optimization (Introduction) Optimization Goal: Find the minimizer that minimizes the

Robotic Motion Planning: Potential Functions Robotics Institute 16-735

Quantum tunneling in nonintegrable systems: beyond the leading order semiclassical description

Dynamics of a growth model with negative and positive externalities A. Antoci, S. Borghesi, M.

Nonlinear Control Lecture # 2 Two-Dimensional Systems Nonlinear Control Lecture # 2

Extensive-Form Correlated Equilibrium Gabriele Farina 1 Chun Kai Ling 1 Fei Fang 2 Tuomas Sandholm

LOCAL HOUSING ALLOWANCE What is Local Housing Allowance

Welcome to the LIFE Webinar Series. We will be starting soon. The Low-Income Forum on Energy

First Quarter Fiscal 2017 Quarter Ended December 31, 2016 Safe Harbor Statement Certain