Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 - PowerPoint PPT Presentation

Lectu ture 6 6 reca recap Prof. Leal-Taixé and Prof. Niessner 1

Ne Neural Ne Netw twork Width Depth Prof. Leal-Taixé and Prof. Niessner 2

Gr Gradi dient De Descent fo for Neural Netwo works ks ℎ " =: ! " ' " ( " ℎ # =2 ","," … ! # … ' # ℎ $ ( # =: 7 8,9 : ;,< (2) = ! $ =2 ?,@,A … ℎ & … =: 5 ) = ' ) − ( ) $ =- ?,@ ' ) = +(- #,) + 0 ℎ 1 2 #,),1 ) ℎ 1 = +(- ",1 + 0 ! 4 2 ",1,4 ) 1 4 Just simple: + ! = max(0, !) Prof. Leal-Taixé and Prof. Niessner 3

St Stocha hastic Gradient nt De Descent nt (SG SGD) D) ! "#$ = ! " − '( ) *(! " , - {$..0} , 2 {$..0} ) 0 ( ) * 5 $ 0 ∑ 56$ ( ) * = 7 now refers to 7 -th iteration 8 training samples in the current batch Gradient for the 7 -th batch + all variations of SGD: momentum, RMSProp, Adam, … Prof. Leal-Taixé and Prof. Niessner 4

Im Importan ance of Lear arnin ing Rat ate Prof. Leal-Taixé and Prof. Niessner 5

Ove Over- an and d Unde derfit ittin ing Underfitted Appropriate Overfitted Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017 Prof. Leal-Taixé and Prof. Niessner 6

Ove Over- an and d Unde derfit ittin ing Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html Prof. Leal-Taixé and Prof. Niessner 7

Ba Basic r rec ecipe f e for or m machine l e lea earning • Split your data 60% 20% 20% validation test train Find your hyperparameters Prof. Leal-Taixé and Prof. Niessner 8

Ba Basic r rec ecipe f e for or m machine l e lea earning Prof. Leal-Taixé and Prof. Niessner 9

Ba Basically… Prof. Leal-Taixé and Prof. Niessner 10 Deep learning memes

Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 11 Deep learning memes

Going Deep into to Neural Netw tworks Prof. Leal-Taixé and Prof. Niessner 14

Si Simpl mple St Star arting ng Point nts for Debuggi bugging ng • Start simple! – First, overfit to a single training sample – Second, overfit to several training samples • Always try simple architecture first – It will verify that you are learning something • Estimate timings (how long for each epoch?) Prof. Leal-Taixé and Prof. Niessner 15

Ne Neural Ne Netw twork • Problems of going deeper… – Vanishing gradients (multiplication of chain rule) • The impact of small decisions (architecture, activation functions...) • Is my network training correctly? Prof. Leal-Taixé and Prof. Niessner 16

Ne Neural Ne Netw tworks 2) 2) Functions in Neu eurons 3) Input t of da data ta 1) 1) Ou Outp tput t functio ctions Prof. Leal-Taixé and Prof. Niessner 17

Outp tput t Functi tions Prof. Leal-Taixé and Prof. Niessner 18

Ne Neural Ne Netw tworks What is the shape of this function? Loss (Softmax, Hinge) Prediction Prof. Leal-Taixé and Prof. Niessner 19

Sigmoid for Bi Si Binary P Pred ediction ons 1 σ ( x ) = x 0 1 + e − x θ 0 1 Can be θ 1 interpreted as X x 1 a probability θ 2 0 x 2 p ( y i = 1 | x i , θ ) Prof. Leal-Taixé and Prof. Niessner 20

So Softmax fo formu rmulation • What if we have multiple classes? x 0 θ 0 θ 1 X Π i x 1 θ 2 x 2 Prof. Leal-Taixé and Prof. Niessner 21

So Softmax fo formu rmulation • What if we have multiple classes? x 0 X Softmax x 1 X x 2 Prof. Leal-Taixé and Prof. Niessner 22

So Softmax fo formu rmulation • What if we have multiple classes? x 0 e x i θ 1 Π 1 = X e x i θ 1 + e x i θ 2 Softmax x 1 e x i θ 2 X Π 2 = e x i θ 1 + e x i θ 2 x 2 Prof. Leal-Taixé and Prof. Niessner 23

So Softmax fo formu rmulation • Softmax exp e x θ i p ( y i | x , θ ) = n P e x θ k normalize k =1 • Softmax loss (Maximum Likelihood Estimate) e s yi ✓ ◆ L i = − log P k e s k Prof. Leal-Taixé and Prof. Niessner 24

Loss Functi tions Prof. Leal-Taixé and Prof. Niessner 25

Na Naïve ve Losses " L2 Loss: ! " = ∑ %&' ( ) % − + , % 12 24 42 23 15 20 40 25 - Sum of squared differences (SSD) 34 32 5 2 34 32 5 2 - Prune to outliers 12 31 12 31 12 31 12 31 - Compute-efficient (optimization) 31 64 5 13 31 64 5 13 - Optimum is the mean , % ) % L1 Loss: ! ' = ∑ %&' ( |) % − +(, % )| ! " ,, ) = 9 + 16 + 4 + 4 + 0 + ⋯ + 0 = 66 - Sum of absolute differences - Robust ! ' ,, ) = 3 + 4 + 2 + 2 + 0 + ⋯ + 0 = 15 - Costly to compute - Optimum is the median Prof. Leal-Taixé and Prof. Niessner 26

Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : Suppose: 3 training examples and 3 classes 3. 3.2 1.3 2.2 cat scores 5.1 4. 4.9 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 27

Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : 3.2 Suppose: 3 training examples and 3 classes 5.1 -1.7 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 28

Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : 3.2 24.5 Suppose: 3 training examples and 3 classes exp 5.1 164.0 -1.7 0.18 3.2 3. 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 29

Cross-En Cros Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 -1.7 0.18 0.00 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 30

Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 2.04 -log(x) Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 0.14 -1.7 0.18 0.00 6.94 3.2 3. 1.3 2.2 cat scores 5.1 4. 4.9 2.5 chair -1.7 2.0 -3. 3.1 “car” 2.0 .04 0.14 0. 6.94 6. Loss Prof. Leal-Taixé and Prof. Niessner 31

Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 2.0 .04 -log(x) Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 0.14 -1.7 0.18 0.00 6.94 9 ! = 1 @ A ! " = "B7 3.2 3. 1.3 2.2 = ! 7 + ! D + ! E cat scores = 5.1 4. 4.9 2.5 chair 3 -1.7 2.0 -3. 3.1 “car” = 2.04 + 0.079 + 6.156 = 2.0 .04 0.07 0. 079 6.156 6. Loss 3 = O. PQ Prof. Leal-Taixé and Prof. Niessner 32

Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Prof. Leal-Taixé and Prof. Niessner 33

Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Prof. Leal-Taixé and Prof. Niessner 34

Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Suppose: 3 training examples and 3 classes Prof. Leal-Taixé and Prof. Niessner 35

Hinge Loss Hi ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Suppose: 3 training examples and 3 classes 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Prof. Leal-Taixé and Prof. Niessner 36

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 - PowerPoint PPT Presentation

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 Ne Neural Ne Netw twork Width Depth Prof. Leal-Taix and Prof. Niessner 2 Gr Gradi dient De Descent fo for Neural Netwo works ks " =: ! " ' " (

Real Estate Act Amendment Implementation Update RECA 2.0 RECA Board of Directors One member

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What d do w we k know so so

Compsci 101 101 Introductio ion Live L Lectu ture re Susan Rodger August 18, 2020

UNC-CH CH Scho hool ol of of So Social ial Wor ork k Clinic linical al Lectu ture re

Compsci 101 101 Introductio ion Live L Lectu ture re Susan Rodger sum(lst) sum of the

HSC SC H HEAD ST START L T LECTU TURE Standard rd M Mathematics th October 2 Satu turday

Mar Maricultu ture + + In Innova0 a0on = = Opportu tunity ty Mar Maricultu ture + + In

In Innova& a&on + + Mar Maricultu ture = = $ Mar Maricultu ture: Recent t

IN TROPICAL NORTH QUEENSLAND Adven ventu ture re and nd Natu ture Ba Based sed Tou

L e c ture 2 L e c ture 2 Population E Population E c ology c ology Po pula tio n Gro

Situation of Piracy & Armed Robbery against Ships in Asia ReCA eCAAP AP ISC SC Pi Piracy

Fo Fore reca cast sting and and De Deflecting ecting the Opi the Opioid Epidem Epidemic ic

RECA 2019 Annual Meeting Your hosts Rob Telford, Chair Bob Myroniuk, Executive Director

Reca Recall: Econom Economics cs Goa oal in n Li Life fe Pi Pigging Ou Out (consump

DASP Top 10 (Smart Contract Vulns) Reca ecall ll Bitcoin Scripting language intended for

SPIFFY: Inducing Cost-Detectability Tradeoffs for Persistent Link-Flooding Attacks Min Suk Kang

Ab-Or system at 5 kilobars with excess H 2 O, i.e, P H2O = 5 kb Note: 1. This is actually a 3

Personalized Mathematical Word Problem Generation Oleksandr Polozov * Eleanor ORourke * Adam M.

Revealed! Amin Astaneh, DevOps Track, DrupalCon Nashville Who Am I? Amin Astaneh Senior

An Alterna)ve Approach to Hardware Benchmarking of CAESAR Candidates Based on the Use of

California Framework for Grid Value of Vehicle Grid Integration (VGI) Presentation to VGI

M-theory S-Matrix from 3d SCFT Silviu S. Pufu, Princeton University Based on: arXiv:1711.07343

Cache Refill/Access Decoupling for Vector Machines Christopher Batten, Ronny Krashinsky, Steve

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 - PowerPoint PPT Presentation

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 Ne Neural Ne Netw twork Width Depth Prof. Leal-Taix and Prof. Niessner 2 Gr Gradi dient De Descent fo for Neural Netwo works ks " =: ! " ' " (

Real Estate Act Amendment Implementation Update RECA 2.0 RECA Board of Directors One member

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Lectu ture 8 Recap Prof. Leal-Taix and Prof. Niessner 1 Wh What d do w we k know so so

Compsci 101 101 Introductio ion Live L Lectu ture re Susan Rodger August 18, 2020

UNC-CH CH Scho hool ol of of So Social ial Wor ork k Clinic linical al Lectu ture re

Compsci 101 101 Introductio ion Live L Lectu ture re Susan Rodger sum(lst) sum of the

HSC SC H HEAD ST START L T LECTU TURE Standard rd M Mathematics th October 2 Satu turday

Mar Maricultu ture + + In Innova0 a0on = = Opportu tunity ty Mar Maricultu ture + + In

In Innova&amp; a&amp;on + + Mar Maricultu ture = = $ Mar Maricultu ture: Recent t

IN TROPICAL NORTH QUEENSLAND Adven ventu ture re and nd Natu ture Ba Based sed Tou

L e c ture 2 L e c ture 2 Population E Population E c ology c ology Po pula tio n Gro

Situation of Piracy &amp; Armed Robbery against Ships in Asia ReCA eCAAP AP ISC SC Pi Piracy

Fo Fore reca cast sting and and De Deflecting ecting the Opi the Opioid Epidem Epidemic ic

RECA 2019 Annual Meeting Your hosts Rob Telford, Chair Bob Myroniuk, Executive Director

Reca Recall: Econom Economics cs Goa oal in n Li Life fe Pi Pigging Ou Out (consump

DASP Top 10 (Smart Contract Vulns) Reca ecall ll Bitcoin Scripting language intended for

SPIFFY: Inducing Cost-Detectability Tradeoffs for Persistent Link-Flooding Attacks Min Suk Kang

Ab-Or system at 5 kilobars with excess H 2 O, i.e, P H2O = 5 kb Note: 1. This is actually a 3

Personalized Mathematical Word Problem Generation Oleksandr Polozov * Eleanor ORourke * Adam M.

Revealed! Amin Astaneh, DevOps Track, DrupalCon Nashville Who Am I? Amin Astaneh Senior

An Alterna)ve Approach to Hardware Benchmarking of CAESAR Candidates Based on the Use of

California Framework for Grid Value of Vehicle Grid Integration (VGI) Presentation to VGI

M-theory S-Matrix from 3d SCFT Silviu S. Pufu, Princeton University Based on: arXiv:1711.07343

Cache Refill/Access Decoupling for Vector Machines Christopher Batten, Ronny Krashinsky, Steve

In Innova& a&on + + Mar Maricultu ture = = $ Mar Maricultu ture: Recent t

Situation of Piracy & Armed Robbery against Ships in Asia ReCA eCAAP AP ISC SC Pi Piracy