Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin - PowerPoint PPT Presentation

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 1 April 10, 2018

Administrative: Live Questions We’ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and meeting ID: https://piazza.com/class/jdmurnqexkt47x?cid=108 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 2 April 10, 2018

Administrative: Office Hours Office hours started this week, schedule is on the course website: http://cs231n.stanford.edu/office_hours.html Areas of expertise for all TAs are posted on Piazza: https://piazza.com/class/jdmurnqexkt47x?cid=155 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 3 April 10, 2018

Administrative: Assignment 1 Assignment 1 is released: http://cs231n.github.io/assignments2018/assignment1/ Due Wednesday April 18 , 11:59pm Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 4 April 10, 2018

Administrative: Google Cloud You should have received an email yesterday about claiming a coupon for Google Cloud; make a private post on Piazza if you didn’t get it There was a problem with @cs.stanford.edu emails; this is resolved If you have problems with coupons: Post on Piazza DO NOT email me, DO NOT email Prof. Phil Levis Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 5 April 10, 2018

Administrative: SCPD Tutors This year the SCPD office has hired tutors specifically for SCPD students taking CS231N; you should have received an email about this yesterday (4/9/2018) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 6 April 10, 2018

Administrative: Poster Session Poster session will be Tuesday June 12 (our final exam slot) Attendance is mandatory for non-SCPD students; if you don’t have a legitimate reason for skipping it then you forfeit the points for the poster presentation Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 7 April 10, 2018

Recall from last time : Challenges of recognition Viewpoint Illumination Occlusion Deformation This image by Umberto Salvagnin This image is CC0 1.0 public domain This image by jonsson is licensed is licensed under CC-BY 2.0 under CC-BY 2.0 Clutter Intraclass Variation This image is CC0 1.0 public domain This image is CC0 1.0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 8 April 10, 2018

Recall from last time : data-driven approach, kNN 1-NN classifier 5-NN classifier train test train validation test Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 9 April 10, 2018

Recall from last time : Linear Classifier f(x,W) = Wx + b Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 10 April 10, 2018

Recall from last time : Linear Classifier TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 11 April 10, 2018

Suppose: 3 training examples, 3 classes. With some W the scores are: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 12 April 10, 2018

Suppose: 3 training examples, 3 classes. A loss function tells how With some W the scores are: good our current classifier is Given a dataset of examples Where is image and 3.2 1.3 2.2 cat is (integer) label 5.1 4.9 2.5 Loss over the dataset is a car sum of loss over examples: -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 13 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 14 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 15 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 16 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1) -1.7 2.0 -3.1 frog = max(0, 2.9) + max(0, -3.9) = 2.9 + 0 2.9 Losses: = 2.9 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 17 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1) -1.7 2.0 -3.1 frog = max(0, -2.6) + max(0, -1.9) = 0 + 0 2.9 0 Losses: = 0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 18 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 2.2 - (-3.1) + 1) +max(0, 2.5 - (-3.1) + 1) -1.7 2.0 -3.1 frog = max(0, 6.3) + max(0, 6.6) = 6.3 + 6.6 2.9 0 12.9 Losses: = 12.9 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 19 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Loss over full dataset is average: -1.7 2.0 -3.1 frog L = (2.9 + 0 + 12.9)/3 2.9 0 12.9 Losses: = 5.27 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 20 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q: What happens to -1.7 2.0 -3.1 loss if car scores frog change a bit? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 21 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q2: what is the -1.7 2.0 -3.1 min/max possible frog loss? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 22 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q3: At initialization W -1.7 2.0 -3.1 is small so all s ≈ 0. frog What is the loss? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 23 April 10, 2018

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q4: What if the sum -1.7 2.0 -3.1 was over all classes? frog (including j = y_i) 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 24 April 10, 2018

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin - PowerPoint PPT Presentation

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 1 April 10, 2018 Administrative: Live Questions Well use Zoom to take questions from remote students live-streaming the lecture

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Prior and loss robustness for varoius loss functions Agnieszka Kami nska and Zdzis law

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

BEEM103 Optimization Techniques for Economists Level Curves Multivariate Functions Isoquants

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Elementary Functions Part 1, Functions Lecture 1.1c, Finding the domains of functions Dr. Ken W.

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Declarations vs Definitions Inline Functions Class Member functions

First-Person Vision Kristen Grauman Department of Computer Science University of Texas at Austin

Du Python, du Bash, un Raspberry Pi et des clefs USB Kitten Groomer, le nettoyeur de cl e USB.

Reflections on Data In Integration for SDN Anduo Wang Jason Croft* Eduard Dragut

Uncertainties in atomic data and how they propagate in chemical abundances: L i & Na Karin

UPDATES ON THE PION INCLUSIVE CROSS-SECTION ANALYSIS (MONTE CARLO STUDY) Ajib Paudel Graduate

Good M orning! M CS1250 Introduction to Journalism October 2016, Ulrich Werner, Adj. Prof.

Prevention Learning Community Meeting Hosted by: Division of Behavioral Health & Recovery

C ONCEPTUAL D ESIGN : ER TO R ELATIONAL TO SQL How to represent Entity sets,

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin - PowerPoint PPT Presentation

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 1 April 10, 2018 Administrative: Live Questions Well use Zoom to take questions from remote students live-streaming the lecture

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Prior and loss robustness for varoius loss functions Agnieszka Kami nska and Zdzis law

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

BEEM103 Optimization Techniques for Economists Level Curves Multivariate Functions Isoquants

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Elementary Functions Part 1, Functions Lecture 1.1c, Finding the domains of functions Dr. Ken W.

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Declarations vs Definitions Inline Functions Class Member functions

First-Person Vision Kristen Grauman Department of Computer Science University of Texas at Austin

Du Python, du Bash, un Raspberry Pi et des clefs USB Kitten Groomer, le nettoyeur de cl e USB.

Reflections on Data In Integration for SDN Anduo Wang Jason Croft* Eduard Dragut

Uncertainties in atomic data and how they propagate in chemical abundances: L i &amp; Na Karin

UPDATES ON THE PION INCLUSIVE CROSS-SECTION ANALYSIS (MONTE CARLO STUDY) Ajib Paudel Graduate

Good M orning! M CS1250 Introduction to Journalism October 2016, Ulrich Werner, Adj. Prof.

Prevention Learning Community Meeting Hosted by: Division of Behavioral Health &amp; Recovery

C ONCEPTUAL D ESIGN : ER TO R ELATIONAL TO SQL How to represent Entity sets,

Uncertainties in atomic data and how they propagate in chemical abundances: L i & Na Karin

Prevention Learning Community Meeting Hosted by: Division of Behavioral Health & Recovery