lectu ture 6 6 reca recap
play

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 - PowerPoint PPT Presentation

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 Ne Neural Ne Netw twork Width Depth Prof. Leal-Taix and Prof. Niessner 2 Gr Gradi dient De Descent fo for Neural Netwo works ks " =: ! " ' " (


  1. Lectu ture 6 6 reca recap Prof. Leal-Taixé and Prof. Niessner 1

  2. Ne Neural Ne Netw twork Width Depth Prof. Leal-Taixé and Prof. Niessner 2

  3. Gr Gradi dient De Descent fo for Neural Netwo works ks ℎ " =: ! " ' " ( " ℎ # =2 ","," … ! # … ' # ℎ $ ( # =: 7 8,9 : ;,< (2) = ! $ =2 ?,@,A … ℎ & … =: 5 ) = ' ) − ( ) $ =- ?,@ ' ) = +(- #,) + 0 ℎ 1 2 #,),1 ) ℎ 1 = +(- ",1 + 0 ! 4 2 ",1,4 ) 1 4 Just simple: + ! = max(0, !) Prof. Leal-Taixé and Prof. Niessner 3

  4. St Stocha hastic Gradient nt De Descent nt (SG SGD) D) ! "#$ = ! " − '( ) *(! " , - {$..0} , 2 {$..0} ) 0 ( ) * 5 $ 0 ∑ 56$ ( ) * = 7 now refers to 7 -th iteration 8 training samples in the current batch Gradient for the 7 -th batch + all variations of SGD: momentum, RMSProp, Adam, … Prof. Leal-Taixé and Prof. Niessner 4

  5. Im Importan ance of Lear arnin ing Rat ate Prof. Leal-Taixé and Prof. Niessner 5

  6. Ove Over- an and d Unde derfit ittin ing Underfitted Appropriate Overfitted Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017 Prof. Leal-Taixé and Prof. Niessner 6

  7. Ove Over- an and d Unde derfit ittin ing Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html Prof. Leal-Taixé and Prof. Niessner 7

  8. Ba Basic r rec ecipe f e for or m machine l e lea earning • Split your data 60% 20% 20% validation test train Find your hyperparameters Prof. Leal-Taixé and Prof. Niessner 8

  9. Ba Basic r rec ecipe f e for or m machine l e lea earning Prof. Leal-Taixé and Prof. Niessner 9

  10. Ba Basically… Prof. Leal-Taixé and Prof. Niessner 10 Deep learning memes

  11. Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 11 Deep learning memes

  12. Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 12 Deep learning memes

  13. Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 13 Deep learning memes

  14. Going Deep into to Neural Netw tworks Prof. Leal-Taixé and Prof. Niessner 14

  15. Si Simpl mple St Star arting ng Point nts for Debuggi bugging ng • Start simple! – First, overfit to a single training sample – Second, overfit to several training samples • Always try simple architecture first – It will verify that you are learning something • Estimate timings (how long for each epoch?) Prof. Leal-Taixé and Prof. Niessner 15

  16. Ne Neural Ne Netw twork • Problems of going deeper… – Vanishing gradients (multiplication of chain rule) • The impact of small decisions (architecture, activation functions...) • Is my network training correctly? Prof. Leal-Taixé and Prof. Niessner 16

  17. Ne Neural Ne Netw tworks 2) 2) Functions in Neu eurons 3) Input t of da data ta 1) 1) Ou Outp tput t functio ctions Prof. Leal-Taixé and Prof. Niessner 17

  18. Outp tput t Functi tions Prof. Leal-Taixé and Prof. Niessner 18

  19. Ne Neural Ne Netw tworks What is the shape of this function? Loss (Softmax, Hinge) Prediction Prof. Leal-Taixé and Prof. Niessner 19

  20. Sigmoid for Bi Si Binary P Pred ediction ons 1 σ ( x ) = x 0 1 + e − x θ 0 1 Can be θ 1 interpreted as X x 1 a probability θ 2 0 x 2 p ( y i = 1 | x i , θ ) Prof. Leal-Taixé and Prof. Niessner 20

  21. So Softmax fo formu rmulation • What if we have multiple classes? x 0 θ 0 θ 1 X Π i x 1 θ 2 x 2 Prof. Leal-Taixé and Prof. Niessner 21

  22. So Softmax fo formu rmulation • What if we have multiple classes? x 0 X Softmax x 1 X x 2 Prof. Leal-Taixé and Prof. Niessner 22

  23. So Softmax fo formu rmulation • What if we have multiple classes? x 0 e x i θ 1 Π 1 = X e x i θ 1 + e x i θ 2 Softmax x 1 e x i θ 2 X Π 2 = e x i θ 1 + e x i θ 2 x 2 Prof. Leal-Taixé and Prof. Niessner 23

  24. So Softmax fo formu rmulation • Softmax exp e x θ i p ( y i | x , θ ) = n P e x θ k normalize k =1 • Softmax loss (Maximum Likelihood Estimate) e s yi ✓ ◆ L i = − log P k e s k Prof. Leal-Taixé and Prof. Niessner 24

  25. Loss Functi tions Prof. Leal-Taixé and Prof. Niessner 25

  26. Na Naïve ve Losses " L2 Loss: ! " = ∑ %&' ( ) % − + , % 12 24 42 23 15 20 40 25 - Sum of squared differences (SSD) 34 32 5 2 34 32 5 2 - Prune to outliers 12 31 12 31 12 31 12 31 - Compute-efficient (optimization) 31 64 5 13 31 64 5 13 - Optimum is the mean , % ) % L1 Loss: ! ' = ∑ %&' ( |) % − +(, % )| ! " ,, ) = 9 + 16 + 4 + 4 + 0 + ⋯ + 0 = 66 - Sum of absolute differences - Robust ! ' ,, ) = 3 + 4 + 2 + 2 + 0 + ⋯ + 0 = 15 - Costly to compute - Optimum is the median Prof. Leal-Taixé and Prof. Niessner 26

  27. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : Suppose: 3 training examples and 3 classes 3. 3.2 1.3 2.2 cat scores 5.1 4. 4.9 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 27

  28. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : 3.2 Suppose: 3 training examples and 3 classes 5.1 -1.7 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 28

  29. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : 3.2 24.5 Suppose: 3 training examples and 3 classes exp 5.1 164.0 -1.7 0.18 3.2 3. 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 29

  30. Cross-En Cros Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 -1.7 0.18 0.00 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 30

  31. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 2.04 -log(x) Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 0.14 -1.7 0.18 0.00 6.94 3.2 3. 1.3 2.2 cat scores 5.1 4. 4.9 2.5 chair -1.7 2.0 -3. 3.1 “car” 2.0 .04 0.14 0. 6.94 6. Loss Prof. Leal-Taixé and Prof. Niessner 31

  32. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 2.0 .04 -log(x) Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 0.14 -1.7 0.18 0.00 6.94 9 ! = 1 @ A ! " = "B7 3.2 3. 1.3 2.2 = ! 7 + ! D + ! E cat scores = 5.1 4. 4.9 2.5 chair 3 -1.7 2.0 -3. 3.1 “car” = 2.04 + 0.079 + 6.156 = 2.0 .04 0.07 0. 079 6.156 6. Loss 3 = O. PQ Prof. Leal-Taixé and Prof. Niessner 32

  33. Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Prof. Leal-Taixé and Prof. Niessner 33

  34. Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Prof. Leal-Taixé and Prof. Niessner 34

  35. Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Suppose: 3 training examples and 3 classes Prof. Leal-Taixé and Prof. Niessner 35

  36. Hinge Loss Hi ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Suppose: 3 training examples and 3 classes 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Prof. Leal-Taixé and Prof. Niessner 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend