learning as loss minimization
play

Learning as Loss Minimization Machine Learning 1 Learning as loss - PowerPoint PPT Presentation

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup Examples x drawn from a fixed, unknown distribution D Hidden oracle classifier f labels examples We wish to find a hypothesis h that mimics f


  1. Learning as Loss Minimization Machine Learning 1

  2. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 2

  3. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 3

  4. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss • Instead, minimize empirical loss on the training set 4

  5. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 5

  6. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 6

  7. Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? 7

  8. Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? Overfitting! We need something that biases the learner towards simpler hypotheses • Achieved using a regularizer, which penalizes complex hypotheses 8

  9. Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 9

  10. Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 10

  11. Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 11

  12. The 0-1 loss Penalize classification mistakes between true label y and prediction y’ • For linear classifiers, the prediction y’ = sgn( w T x) – Mistake if y w T x · 0 Minimizing 0-1 loss is intractable. Need surrogates 12

  13. The 0-1 loss Loss y w T x < 0, misclassification y w T x > 0, no misclassification y w T x 13

  14. Compare to the hinge loss Loss More penalty as w T x is farther away from the separator on the wrong side y w T x < 0, misclassification Penalize predictions even if they are correct, but too close to the margin y w T x > 0, no misclassification y w T x 14

  15. Support Vector Machines • SVM = linear classifier combined with regularization • Ideally, we would like to minimize 0-1 loss, – But we can’t for computational reasons • SVM minimizes hinge loss – Variants exist 15

  16. SVM objective function Regularization term: Empirical Loss: Maximize the margin Hinge loss • • Imposes a preference over the Penalizes weight vectors that make • • hypothesis space and pushes for mistakes better generalization Can be replaced with other Can be replaced with other loss • • regularization terms which impose functions which impose other other preferences preferences 16

  17. SVM objective function Regularization term: Empirical Loss: Maximize the margin Hinge loss • • Imposes a preference over the Penalizes weight vectors that make • • hypothesis space and pushes for mistakes better generalization Can be replaced with other Can be replaced with other loss • • regularization terms which impose functions which impose other other preferences preferences A hyper-parameter that controls the tradeoff between a large margin and a small hinge-loss 17

  18. The loss function zoo Many loss functions exist – Perceptron loss – Hinge loss (SVM) – Exponential loss (AdaBoost) – Logistic loss (logistic regression) 18

  19. The loss function zoo 19

  20. The loss function zoo Zero-one 20

  21. The loss function zoo Hinge: SVM Zero-one 21

  22. The loss function zoo Hinge: SVM Perceptron Zero-one 22

  23. The loss function zoo Hinge: SVM Exponential: AdaBoost Perceptron Zero-one 23

  24. The loss function zoo Hinge: SVM Exponential: AdaBoost Perceptron Zero-one Logistic regression 24

  25. Learning via Loss Minimization: Summary • Learning via Loss Minimization – Write down a loss function – Minimize empirical loss • Regularize to avoid overfitting – Neural networks use other strategies such as dropout • Widely applicable, different loss functions and regularizers 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend