cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: Regularization Neural - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization Computing Gradients Zsolt Kira Georgia Tech Administrativia HW0 Reminder Due: 01/18, 11:55pm Plagiarism No Tolerance


  1. CS 4803 / 7643: Deep Learning Topics: – Regularization – Neural Networks – Optimization – Computing Gradients Zsolt Kira Georgia Tech

  2. Administrativia • HW0 Reminder – Due: 01/18, 11:55pm • Plagiarism – No Tolerance • Office hours have started (one for every day!) – CCB 222 for instructor – CCB 345 for Tas • Sign up for Piazza if you haven’t! (C) Dhruv Batra 2

  3. Computing • Major bottleneck – GPUs • Options – Google colaboratory allows free TPU access!! • https://colab.research.google.com/notebooks/welcome.ipynb – Google Cloud Credits • courtesy Google – details forthcoming for next HW – PACE-ICE • https://pace.gatech.edu/sites/default/files/pace-ice_orientation_1.pdf (C) Dhruv Batra and Zsolt Kira 3

  4. Recap from last time (C) Dhruv Batra and Zsolt Kira 4

  5. Parametric Approach: Linear Classifier 3072x1 f(x,W) = Wx + b 10x1 Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  6. Error Decomposition Reality Multi-class Logistic Regression Softmax FC HxWx3 Input (C) Dhruv Batra and Zsolt Kira 6

  7. Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 231 24 2 24 Input image 2 7 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  8. Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score 56 231 231 + = 1.5 1.3 2.1 0.0 3.2 437.9 Dog score 24 2 24 0 0.25 0.2 -0.3 -1.2 61.95 Ship score Input image 2 b W 8 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  9. Linear Classifier: Three Viewpoints Visual Viewpoint Algebraic Viewpoint Geometric Viewpoint One template Hyperplanes f(x,W) = Wx per class cutting up space 9 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  10. Recall from last time : Linear Classifier TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 1. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  11. Softmax vs. SVM Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  12. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  13. Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  14. Softmax vs. SVM Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  15. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function 3.2 cat 5.1 car -1.7 frog 15 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  16. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 3.2 24.5 cat exp 5.1 164.0 car -1.7 0.18 frog unnormalized probabilities 16 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  17. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat exp normalize 5.1 164.0 0.87 car -1.7 0.18 0.00 frog unnormalized probabilities probabilities 17 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  18. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat exp normalize 5.1 164.0 0.87 car -1.7 0.18 0.00 frog Unnormalized log- unnormalized probabilities probabilities / logits probabilities 18 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  19. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat L i = -log(0.13) = 2.04 exp normalize 5.1 164.0 0.87 car -1.7 0.18 0.00 frog Unnormalized log- unnormalized probabilities probabilities / logits probabilities 19 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  20. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat L i = -log(0.13) = 2.04 exp normalize 5.1 164.0 0.87 car Maximum Likelihood Estimation -1.7 0.18 0.00 frog Choose probabilities to maximize the likelihood of the observed data Unnormalized log- unnormalized probabilities probabilities / logits probabilities 20 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  21. Log-Likelihood / KL-Divergence / Cross-Entropy (C) Dhruv Batra and Zsolt Kira 21

  22. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 1.00 cat compare exp normalize 5.1 164.0 0.87 0.00 car -1.7 0.18 0.00 0.00 frog Unnormalized log- unnormalized probabilities Correct probabilities / logits probabilities probs 22 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  23. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 1.00 cat compare exp normalize Kullback–Leibler 5.1 164.0 0.87 0.00 car divergence -1.7 0.18 0.00 0.00 frog Unnormalized log- unnormalized probabilities Correct probabilities / logits probabilities probs 23 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  24. Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 1.00 cat compare exp normalize 5.1 164.0 0.87 0.00 car Cross Entropy -1.7 0.18 0.00 0.00 frog Unnormalized log- unnormalized probabilities Correct probabilities / logits probabilities probs 24 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  25. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  26. Plan for Today • Regularization • Neural Networks • Optimization • Computing Gradients (C) Dhruv Batra and Zsolt Kira 26

  27. Regularization Data loss : Model predictions should match training data 27 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  28. Regularization Data loss : Model predictions Regularization : Prevent the model should match training data from doing too well on training data 28 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  29. Regularization = regularization strength (hyperparameter) Data loss : Model predictions Regularization : Prevent the model should match training data from doing too well on training data 29 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  30. Model Complexity y x 30 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  31. Polynomial Regression f y x 31 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  32. Regularization: Prefer Simpler Models f 1 f 2 y x Regularization pushes against fitting the data too well so we don’t fit noise in the data 32 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

  33. Polynomial Regression (C) Dhruv Batra and Zsolt Kira 33

  34. Polynomial Regression • Demo: – https://arachnoid.com/polysolve/ • Data: – 10 6 – 15 9 – 20 11 – 25 12 – 29 13 – 40 11 – 50 10 – 60 9 (C) Dhruv Batra and Zsolt Kira 35

  35. Regularization = regularization strength (hyperparameter) Data loss : Model predictions Regularization : Prevent the model should match training data from doing too well on training data 37 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend