cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Linear Classifiers Loss Functions Zsolt Kira Georgia Tech Administrativia Office hours started this week For now CCB commons area for Tas CCB 222 for instructor Any changes will


  1. CS 4803 / 7643: Deep Learning Topics: – Linear Classifiers – Loss Functions Zsolt Kira Georgia Tech

  2. Administrativia • Office hours started this week – For now CCB commons area for Tas – CCB 222 for instructor – Any changes will be announced on Piazza • Notes and readings on class webpage – http://ripl.cc.gatech.edu/classes/AY2019/cs7643_spring/ • HW0 Reminder – Due: 01/18 11:55pm (C) Dhruv Batra and Zsolt Kira 2

  3. Plan for Today • Linear Classifiers – Linear scoring functions • Loss Functions – Multi-class hinge loss – Softmax cross-entropy loss (C) Dhruv Batra and Zsolt Kira 3

  4. Linear Classification

  5. Neural Network Linear classifiers This image is CC0 1.0 public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  6. Visual Question Answering Neural Network Image Embedding (VGGNet) Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding (LSTM) “How many horses are in this image?” (C) Dhruv Batra and Zsolt Kira 7

  7. Recall CIFAR10 50,000 training images each image is 32x32x3 10,000 test images. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  8. Parametric Approach Image 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  9. Parametric Approach: Linear Classifier f(x,W) = Wx Image 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  10. Parametric Approach: Linear Classifier 3072x1 f(x,W) = Wx Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  11. Parametric Approach: Linear Classifier 3072x1 f(x,W) = Wx + b 10x1 Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  12. Error Decomposition AlexNet Reality Softmax FC 1000 FC 4096 FC 4096 Pool 3x3 conv, 256 3x3 conv, 384 Pool 3x3 conv, 384 Pool 5x5 conv, 256 11x11 conv, 96 Input (C) Dhruv Batra and Zsolt Kira 14

  13. Error Decomposition Reality Multi-class Logistic Regression Softmax FC HxWx3 Input (C) Dhruv Batra and Zsolt Kira 15

  14. Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 231 24 2 24 Input image 2 16 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  15. Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score 56 231 231 + = 1.5 1.3 2.1 0.0 3.2 437.9 Dog score 24 2 24 0 0.25 0.2 -0.3 -1.2 61.95 Ship score Input image 2 b W 17 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  16. Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Algebraic Viewpoint f(x,W) = Wx 18 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  17. Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Input image Algebraic Viewpoint f(x,W) = Wx 0.2 -0.5 1.5 1.3 0 .25 W 0.1 2.0 2.1 0.0 0.2 -0.3 b 1.1 3.2 -1.2 Score -96.8 437.9 61.95 19 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  18. Interpreting a Linear Classifier 20 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  19. Interpreting a Linear Classifier: Visual Viewpoint 21 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  20. Interpreting a Linear Classifier: Geometric Viewpoint f(x,W) = Wx + b Array of 32x32x3 numbers (3072 numbers total) Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0 22 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  21. Hard cases for a linear classifier Class 1 : Class 1 : Class 1 : First and third quadrants 1 <= L2 norm <= 2 Three modes Class 2 : Class 2 : Class 2 : Everything else Everything else Second and fourth quadrants 23 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  22. Linear Classifier: Three Viewpoints Visual Viewpoint Algebraic Viewpoint Geometric Viewpoint One template Hyperplanes f(x,W) = Wx per class cutting up space 24 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  23. So far : Defined a (linear) score function f(x,W) = Wx + b Example class scores for 3 images for some W: How can we tell whether this W is good or bad? Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  24. So far : Defined a (linear) score function TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  25. Supervised Learning • Input: x (images, text, emails…) • Output: y (spam or non-spam…) • (Unknown) Target Function – f: X  Y (the “true” mapping / reality) • Data – (x 1 ,y 1 ), (x 2 ,y 2 ), …, (x N ,y N ) • Model / Hypothesis Class – {h: X  Y} – e.g. y = h(x) = sign(w T x) • Loss Function – How good is a model wrt my data D? • Learning = Search in hypothesis space – Find best h in model class. (C) Dhruv Batra and Zsolt Kira 27

  26. Loss Functions

  27. Suppose: 3 training examples, 3 classes. With some W the scores are: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  28. Suppose: 3 training examples, 3 classes. A loss function tells how With some W the scores are: good our current classifier is Given a dataset of examples Where is image and 3.2 1.3 2.2 cat is (integer) label 5.1 4.9 2.5 Loss over the dataset is a car sum of loss over examples: -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  29. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  30. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  31. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  32. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  33. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1) -1.7 2.0 -3.1 frog = max(0, 2.9) + max(0, -3.9) = 2.9 + 0 2.9 Losses: = 2.9 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  34. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1) -1.7 2.0 -3.1 frog = max(0, -2.6) + max(0, -1.9) = 0 + 0 2.9 0 Losses: = 0 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend