Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin - PowerPoint PPT Presentation

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 1 April 11, 2017

Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20 , 11:59pm on Canvas (Extending due date since it was released late) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 2 April 11, 2017

Administrative Check out Project Ideas on Piazza Schedule for Office hours is on the course website TA specialties are posted on Piazza Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 3 April 11, 2017

Administrative Details about redeeming Google Cloud Credits should go out today; will be posted on Piazza $100 per student to use for homeworks and projects Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 4 April 11, 2017

Recall from last time : Challenges of recognition Viewpoint Illumination Occlusion Deformation This image by Umberto Salvagnin This image is CC0 1.0 public domain This image by jonsson is licensed is licensed under CC-BY 2.0 under CC-BY 2.0 Clutter Intraclass Variation This image is CC0 1.0 public domain This image is CC0 1.0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 5 April 11, 2017

Recall from last time : data-driven approach, kNN 1-NN classifier 5-NN classifier train test train validation test Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 6 April 11, 2017

Recall from last time : Linear Classifier f(x,W) = Wx + b Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 7 April 11, 2017

Recall from last time : Linear Classifier TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 8 April 11, 2017

Suppose: 3 training examples, 3 classes. With some W the scores are: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 9 April 11, 2017

Suppose: 3 training examples, 3 classes. A loss function tells how With some W the scores are: good our current classifier is Given a dataset of examples Where is image and 3.2 1.3 2.2 cat is (integer) label 5.1 4.9 2.5 Loss over the dataset is a car sum of loss over examples: -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 10 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 11 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 12 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 13 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1) -1.7 2.0 -3.1 frog = max(0, 2.9) + max(0, -3.9) = 2.9 + 0 2.9 Losses: = 2.9 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 14 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1) -1.7 2.0 -3.1 frog = max(0, -2.6) + max(0, -1.9) = 0 + 0 2.9 0 Losses: = 0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 15 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 2.2 - (-3.1) + 1) +max(0, 2.5 - (-3.1) + 1) -1.7 2.0 -3.1 frog = max(0, 6.3) + max(0, 6.6) = 6.3 + 6.6 2.9 0 12.9 Losses: = 12.9 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 16 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Loss over full dataset is average: -1.7 2.0 -3.1 frog L = (2.9 + 0 + 12.9)/3 2.9 0 12.9 Losses: = 5.27 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 17 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q: What happens to -1.7 2.0 -3.1 loss if car scores frog change a bit? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 18 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q2: what is the -1.7 2.0 -3.1 min/max possible frog loss? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 19 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q3: At initialization W -1.7 2.0 -3.1 is small so all s ≈ 0. frog What is the loss? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 20 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q4: What if the sum -1.7 2.0 -3.1 was over all classes? frog (including j = y_i) 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 21 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q5: What if we used -1.7 2.0 -3.1 mean instead of frog sum? 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 22 April 11, 2017

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q6: What if we used -1.7 2.0 -3.1 frog 2.9 0 12.9 Losses: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 23 April 11, 2017

Multiclass SVM Loss: Example code Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 24 April 11, 2017

E.g. Suppose that we found a W such that L = 0. Is this W unique? Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 25 April 11, 2017

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin - PowerPoint PPT Presentation

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 1 April 11, 2017 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20 ,

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Prior and loss robustness for varoius loss functions Agnieszka Kami nska and Zdzis law

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

BEEM103 Optimization Techniques for Economists Level Curves Multivariate Functions Isoquants

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Elementary Functions Part 1, Functions Lecture 1.1c, Finding the domains of functions Dr. Ken W.

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Declarations vs Definitions Inline Functions Class Member functions

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington

Statistical Machine Learning Lecture 07: Clustering and Evaluation Kristian Kersting TU

Learning grammar(s) statistically Mark Johnson joint work with Sharon Goldwater and Tom Griffiths

Generalization and Simplification in Machine Learning Shay Moran School of Mathematics, IAS

Universal Artificial Intelligence Marcus Hutter Canberra, ACT, 0200, Australia ANU RSISE NICTA

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

A brief introduction to economics Part I Tyler Moore Computer Science & Engineering

One Minute Responses Nucleotide vs. amino acid sequences Parsimony Genome 559: Introduction

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin - PowerPoint PPT Presentation

Lecture 3: Loss Functions and Optimization Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 1 April 11, 2017 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20 ,

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Prior and loss robustness for varoius loss functions Agnieszka Kami nska and Zdzis law

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

BEEM103 Optimization Techniques for Economists Level Curves Multivariate Functions Isoquants

Periodic Functions and Orthogonal Systems Periodic Functions Even and Odd Functions

Elementary Functions Part 1, Functions Lecture 1.1c, Finding the domains of functions Dr. Ken W.

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Functions Programmer-Defined Functions Local Variables in Functions Overloading

Functions Declarations vs Definitions Inline Functions Class Member functions

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington

Statistical Machine Learning Lecture 07: Clustering and Evaluation Kristian Kersting TU

Learning grammar(s) statistically Mark Johnson joint work with Sharon Goldwater and Tom Griffiths

Generalization and Simplification in Machine Learning Shay Moran School of Mathematics, IAS

Universal Artificial Intelligence Marcus Hutter Canberra, ACT, 0200, Australia ANU RSISE NICTA

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

A brief introduction to economics Part I Tyler Moore Computer Science &amp; Engineering

One Minute Responses Nucleotide vs. amino acid sequences Parsimony Genome 559: Introduction

A brief introduction to economics Part I Tyler Moore Computer Science & Engineering