CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: – Linear Classifiers – Loss Functions Zsolt Kira Georgia Tech

Administrativia • Office hours started this week – For now CCB commons area for Tas – CCB 222 for instructor – Any changes will be announced on Piazza • Notes and readings on class webpage – http://ripl.cc.gatech.edu/classes/AY2019/cs7643_spring/ • HW0 Reminder – Due: 01/18 11:55pm (C) Dhruv Batra and Zsolt Kira 2

Plan for Today • Linear Classifiers – Linear scoring functions • Loss Functions – Multi-class hinge loss – Softmax cross-entropy loss (C) Dhruv Batra and Zsolt Kira 3

Linear Classification

Neural Network Linear classifiers This image is CC0 1.0 public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Visual Question Answering Neural Network Image Embedding (VGGNet) Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding (LSTM) “How many horses are in this image?” (C) Dhruv Batra and Zsolt Kira 7

Recall CIFAR10 50,000 training images each image is 32x32x3 10,000 test images. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Parametric Approach Image 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Parametric Approach: Linear Classifier f(x,W) = Wx Image 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Parametric Approach: Linear Classifier 3072x1 f(x,W) = Wx Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Parametric Approach: Linear Classifier 3072x1 f(x,W) = Wx + b 10x1 Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Error Decomposition AlexNet Reality Softmax FC 1000 FC 4096 FC 4096 Pool 3x3 conv, 256 3x3 conv, 384 Pool 3x3 conv, 384 Pool 5x5 conv, 256 11x11 conv, 96 Input (C) Dhruv Batra and Zsolt Kira 14

Error Decomposition Reality Multi-class Logistic Regression Softmax FC HxWx3 Input (C) Dhruv Batra and Zsolt Kira 15

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 231 24 2 24 Input image 2 16 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score 56 231 231 + = 1.5 1.3 2.1 0.0 3.2 437.9 Dog score 24 2 24 0 0.25 0.2 -0.3 -1.2 61.95 Ship score Input image 2 b W 17 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Algebraic Viewpoint f(x,W) = Wx 18 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Input image Algebraic Viewpoint f(x,W) = Wx 0.2 -0.5 1.5 1.3 0 .25 W 0.1 2.0 2.1 0.0 0.2 -0.3 b 1.1 3.2 -1.2 Score -96.8 437.9 61.95 19 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Interpreting a Linear Classifier 20 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Interpreting a Linear Classifier: Visual Viewpoint 21 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Interpreting a Linear Classifier: Geometric Viewpoint f(x,W) = Wx + b Array of 32x32x3 numbers (3072 numbers total) Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0 22 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Hard cases for a linear classifier Class 1 : Class 1 : Class 1 : First and third quadrants 1 <= L2 norm <= 2 Three modes Class 2 : Class 2 : Class 2 : Everything else Everything else Second and fourth quadrants 23 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Linear Classifier: Three Viewpoints Visual Viewpoint Algebraic Viewpoint Geometric Viewpoint One template Hyperplanes f(x,W) = Wx per class cutting up space 24 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

So far : Defined a (linear) score function f(x,W) = Wx + b Example class scores for 3 images for some W: How can we tell whether this W is good or bad? Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

So far : Defined a (linear) score function TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Supervised Learning • Input: x (images, text, emails…) • Output: y (spam or non-spam…) • (Unknown) Target Function – f: X  Y (the “true” mapping / reality) • Data – (x 1 ,y 1 ), (x 2 ,y 2 ), …, (x N ,y N ) • Model / Hypothesis Class – {h: X  Y} – e.g. y = h(x) = sign(w T x) • Loss Function – How good is a model wrt my data D? • Learning = Search in hypothesis space – Find best h in model class. (C) Dhruv Batra and Zsolt Kira 27

Loss Functions

Suppose: 3 training examples, 3 classes. With some W the scores are: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. A loss function tells how With some W the scores are: good our current classifier is Given a dataset of examples Where is image and 3.2 1.3 2.2 cat is (integer) label 5.1 4.9 2.5 Loss over the dataset is a car sum of loss over examples: -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1) -1.7 2.0 -3.1 frog = max(0, 2.9) + max(0, -3.9) = 2.9 + 0 2.9 Losses: = 2.9 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1) -1.7 2.0 -3.1 frog = max(0, -2.6) + max(0, -1.9) = 0 + 0 2.9 0 Losses: = 0 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss Functions Zsolt Kira Georgia Tech Administrativia Office hours started this week For now CCB commons area for Tas CCB 222 for instructor Any changes will

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

LIVING WITH THE Zs COMMUNICATIONS AND ENGAGEMENT DURING THE HIGHER EDUCATION EXPERIENCE Speaker:

Welcome GMLPN Members Meeting Wifi Network: BGC-Secure Password: bgc180609 www.gmlpn.co.uk

Head of the Digital Skills Partnership @Simon_Leeming1 simon.leeming@culture.gov.uk @DCMS Team

Health Training Scheme Penny Longman, Careers Consultant Sam Tweed, NIHR Academic Clinical

CSE 152: Computer Vision Hao Su Lecture 7: Neural Networks Review of Filters: From Linear to

The DPM Detector P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with

Ecological Infrastructure / Catchment Partnership Learning Exchange Ecological Infrastructure /

Discussion with WG1 Raster based approaches more difficult at larger scales Best

CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss Functions Zsolt Kira Georgia Tech Administrativia Office hours started this week For now CCB commons area for Tas CCB 222 for instructor Any changes will

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

LIVING WITH THE Zs COMMUNICATIONS AND ENGAGEMENT DURING THE HIGHER EDUCATION EXPERIENCE Speaker:

Welcome GMLPN Members Meeting Wifi Network: BGC-Secure Password: bgc180609 www.gmlpn.co.uk

Head of the Digital Skills Partnership @Simon_Leeming1 simon.leeming@culture.gov.uk @DCMS Team

Health Training Scheme Penny Longman, Careers Consultant Sam Tweed, NIHR Academic Clinical

CSE 152: Computer Vision Hao Su Lecture 7: Neural Networks Review of Filters: From Linear to

The DPM Detector P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with

Ecological Infrastructure / Catchment Partnership Learning Exchange Ecological Infrastructure /

Discussion with WG1 Raster based approaches more difficult at larger scales Best

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward