le lecture 7 7 r recap ap
play

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - PowerPoint PPT Presentation

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse sses: s: L L2 v vs L s L1 L2 Loss: L1 Loss: $ = "#$ % ! = "#$ ! | " ( " )| %


  1. Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taixé 1

  2. Na Naïve L Losse sses: s: L L2 v vs L s L1 • L2 Loss: • L1 Loss: – 𝑀 $ = ∑ "#$ % – 𝑀 ! = ∑ "#$ ! |𝑧 " − 𝑔(𝑦 " )| % 𝑧 " − 𝑔 𝑦 " – Sum of absolute – Sum of squared differences differences (SSD) – Robust – Prone to outliers – Costly to compute – Compute-efficient (optimization) – Optimum is the median – Optimum is the mean I2DL: Prof. Niessner, Prof. Leal-Taixé 2

  3. Bi Binar ary Cl Clas assificat ation on: Sigmoi moid 1 𝜏 𝒚, 𝜾 = 1 + 𝑓 !∑# ! $ ! 𝑦 % 1 𝜄 % 𝜏 𝑡 = 1 1 + 𝑓 !" Can be 𝑡 Σ 𝜄 # interpreted as 𝑦 # a probability 𝑞(𝑧 = 1|𝑦, 𝜾) 𝜄 $ 0 𝑦 $ I2DL: Prof. Niessner, Prof. Leal-Taixé 3

  4. So Softm tmax x Formula lati tion • What if we have multiple classes? Scores Probabilities 𝑦 % for each class for each class 𝑓 𝒕𝟐 𝑡1 Σ 𝑞(𝑧 = 1|𝒚, 𝜾) = 𝑓 𝒕𝟐 + 𝑓 𝒕𝟑 + 𝑓 𝒕𝟒 𝑓 𝒕𝟑 𝑡2 Σ Softmax 𝑦 # 𝑞(𝑧 = 2|𝒚, 𝜾) = 𝑓 𝒕𝟐 + 𝑓 𝒕𝟑 + 𝑓 𝒕𝟒 𝑓 𝒕𝟒 𝑡3 Σ 𝑞(𝑧 = 3|𝒚, 𝜾) = 𝑓 𝒕𝟐 + 𝑓 𝒕𝟑 + 𝑓 𝒕𝟒 𝑦 $ I2DL: Prof. Niessner, Prof. Leal-Taixé 4

  5. Ex Examp mple: e: Hin inge e vs Cr Cros oss-En Entrop opy Hinge Loss: 𝑀 @ = ∑ ABC % max(0, 𝑡 A − 𝑡 C % + 1) D &'% Cross Entropy : 𝑀 @ = − log( ∑ ( D &( ) Given the following scores for 𝒚 , : Hinge loss: Cross Entropy loss: max(0, −3 − 5 + 1) + 𝑡 = [5, −3, 2] * ! − ln * ! +* "% +* # = 0.05 Model 1 max 0, 2 − 5 + 1 = 0 max(0, 10 − 5 + 1) + * ! 𝑡 = [5, 10, 10] − ln * ! +* &$ +* &$ = 5.70 Model 2 max 0, 10 − 5 + 1 = 12 𝑡 = [5, −20, −20] * ! max(0, −20 − 5 + 1) + − ln Model 3 * ! +* "#$ +* "#$ max 0, −20 − 5 + 1 = 0 = 2 ∗ 10 !## 𝑧 , = 0 − Cross Entropy *always* wants to improve! (loss never 0) − Hinge Loss saturates. I2DL: Prof. Niessner, Prof. Leal-Taixé 5

  6. Sigmoid Acti Si Activa vati tion 1 Forward 𝜏 𝑡 = 1 + 𝑓 !" 𝜖𝑥 = 𝜖𝑡 𝜖𝑀 𝜖𝑀 𝜖𝑥 𝜖𝑡 Saturated neurons kill the gradient flow 𝜖𝑀 𝜖𝑡 = 𝜖𝜏 𝜖𝑀 𝜖𝑀 𝜖𝜏 𝜖𝑡 𝜖𝜏 𝜖𝜏 𝜖𝑡 I2DL: Prof. Niessner, Prof. Leal-Taixé 6

  7. Ta TanH nH Ac Acti tiva vati tion Still saturates Zero- centered [LeCun et al. 1991] Improving Generalization Performance in Character Recognition I2DL: Prof. Niessner, Prof. Leal-Taixé 7

  8. Rec Rectified ed Linear ear Units (ReL ReLU) Dead ReLU Large and What happens if a consistent ReLU outputs zero? gradients Fast convergence Does not saturate [Krizhevsky et al. NeurIPS 2012] ImageNet Classification with Deep Convolutional Neural Networks I2DL: Prof. Niessner, Prof. Leal-Taixé 8

  9. Qui Quick ck Gui Guide • Sigmoid is not really used. • ReLU is the standard choice. • Second choice are the variants of ReLU or Maxout. • Recurrent nets will require TanH or similar. I2DL: Prof. Niessner, Prof. Leal-Taixé 9

  10. In Initialization is Extremely Im Important 𝑦 ∗ = arg min 𝑔(𝑦) • Optimum Initialization Not guaranteed to reach the optimum I2DL: Prof. Niessner, Prof. Leal-Taixé 10

  11. Xavier In Initialization • How to ensure the variance of the output is the same as the input? 𝑜𝑊𝑏𝑠(𝑥 𝑊𝑏𝑠 𝑦 ) = 1 𝑊𝑏𝑠 𝑥 = 1 𝑜 I2DL: Prof. Niessner, Prof. Leal-Taixé 11

  12. ReL ReLU Kills Hal alf of of the e Dat ata 𝑊𝑏𝑠 𝑥 = 2 𝑜 It makes a huge difference! [He et al., ICCV’15] He Initialization I2DL: Prof. Niessner, Prof. Leal-Taixé 12

  13. Le Lecture 8 I2DL: Prof. Niessner, Prof. Leal-Taixé 13

  14. Da Data ta Augm ugmen enta tati tion on I2DL: Prof. Niessner, Prof. Leal-Taixé 14

  15. Da Data ta Augm gmenta tati tion • A classifier has to be invariant to a wide variety of transformations I2DL: Prof. Niessner, Prof. Leal-Taixé 16

  16. Pose Appearance Illumination I2DL: Prof. Niessner, Prof. Leal-Taixé 17

  17. Da Data ta Augm gmenta tati tion • A classifier has to be invariant to a wide variety of transformations • Helping the classifier: synthesize data simulating plausible transformations I2DL: Prof. Niessner, Prof. Leal-Taixé 18

  18. Da Data ta Augm gmenta tati tion [Krizhevsky et al., NIPS’12] ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 19

  19. Da Data ta Augm gmenta tati tion: Br Brightnes ess • Random brightness and contrast changes [Krizhevsky et al., NIPS’12] ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 20

  20. Da Data ta Augm gmenta tati tion: Random Crops ps • Training: random crops – Pick a random L in [256,480] – Resize training image, short side L – Randomly sample crops of 224x224 • Testing: fixed set of crops – Resize image at N scales – 10 fixed crops of 224x224: (4 corners + 1 center ) × 2 flips [Krizhevsky et al., NIPS’12] ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 21

  21. Da Data ta Augm gmenta tati tion • When comparing two networks make sure to use the same data augmentation! • Consider data augmentation a part of your network design I2DL: Prof. Niessner, Prof. Leal-Taixé 22

  22. Ad Advanc vanced Regula larization I2DL: Prof. Niessner, Prof. Leal-Taixé 23

  23. Wei Weight Dec ecay ay • L2 regularization Θ &'$ = Θ & − 𝜗𝛼 ( Θ & , 𝑦, 𝑧 − 𝜇𝜄 & Learning rate Gradient Gradient of L2-regularization • Penalizes large weights Θ/2 Θ/2 Θ 0 • Improves generalization I2DL: Prof. Niessner, Prof. Leal-Taixé 24

  24. Ea Early Stop oppin ing Overfitting I2DL: Prof. Niessner, Prof. Leal-Taixé 25

  25. Ea Early Stop oppin ing • Easy form of regularization 𝜗 𝜗 … … Θ $ Θ ∗ Θ # Θ - Θ % Overfitting 𝜐 I2DL: Prof. Niessner, Prof. Leal-Taixé 26

  26. Bag Bagging an and d En Ensemb emble e Met Methods ods • Train multiple models and average their results • E.g., use a different algorithm for optimization or change the objective function / loss function. • If errors are uncorrelated, the expected combined error will decrease linearly with the ensemble size I2DL: Prof. Niessner, Prof. Leal-Taixé 27

  27. Bag Bagging an and d En Ensemb emble e Met Methods ods • Bagging: uses k different datasets Training Set 3 Training Set 2 Training Set 1 Image Source: [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 28

  28. Dropout Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 29

  29. Dr Dropo pout • Disable a random set of neurons (typically 50%) Forward [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 30

  30. Dropout: In Intuition • Using half the network = half capacity Redundant representations Furry Has two eyes Has a tail Has paws Has two ears [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 31

  31. Dropout: In Intuition • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as a model ensemble [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 32

  32. Dropout: In Intuition • Two models in one Model 1 Model 2 [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 33

  33. Dropout: In Intuition • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as two models in one – Training a large ensemble of models, each on different set of data (mini-batch) and with SHARED parameters Reducing co-adaptation between neurons [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 34

  34. Dr Dropo pout: t: Te Test t Ti Time • All neurons are “turned on” – no dropout Conditions at train and test time are not the same [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 35

  35. Dropo Dr pout: t: Te Test t Ti Time Dropout probability 𝑨 = (𝜄 Q 𝑦 Q + 𝜄 R 𝑦 R ) 5 𝑞 𝑞 = 0.5 • Test: 𝐹 𝑨 = 1 4 (𝜄 Q 0 + 𝜄 R 0 𝑨 • Train: 𝜄 # 𝜄 $ + 𝜄 Q 𝑦 Q + 𝜄 R 0 + 𝜄 Q 0 + 𝜄 R 𝑦 R 𝑦 $ 𝑦 # + 𝜄 Q 𝑦 Q + 𝜄 R 𝑦 R ) = 1 2 (𝜄 Q 𝑦 Q + 𝜄 R 𝑦 R ) Weight scaling inference rule [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 36

  36. Dr Dropo pout: t: Verdict ct • Efficient bagging method with parameter sharing • Try it! • Dropout reduces the effective capacity of a model à larger models, more training time [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 37

  37. Batch Normali lization I2DL: Prof. Niessner, Prof. Leal-Taixé 38

  38. Our Our Go Goal • All we want is that our activations do not die out I2DL: Prof. Niessner, Prof. Leal-Taixé 39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend