CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut - PowerPoint PPT Presentation

Total Loss Input Predicted Actual h 0 [ [ [ x 0 0.05 1 [-20, 45], 0.02 0 h 1 o 0 [80, 0], 0.96 1 x 1 [4, 15], 0.35 1 [45, 60], h 2 ] ] ] J ( ✓ ) = 1 X ` ( f ( x ( i ) ; ✓ ) , y ( i ) ) N Actual i Predicted 36

Total Loss Input Predicted Actual h 0 [ [ [ x 0 0.05 1 [-20, 45], 0.02 0 h 1 o 0 [80, 0], 0.96 1 x 1 [4, 15], 0.35 1 [45, 60], h 2 ] ] ] J ( ✓ ) = 1 X ` ( f ( x ( i ) ; ✓ ) , y ( i ) ) N Actual i Predicted 37

Binary Cross Entropy Loss Input Predicted Actual h 0 [ [ [ x 0 0.05 1 [-20, 45], 0.02 0 h 1 o 0 [80, 0], 0.96 1 x 1 [4, 15], 0.35 1 [45, 60], h 2 ] ] ] J cross entropy ( θ ) = 1 y ( i ) log( f ( x ( i ) ; θ )) + (1 − y ( i ) ) log(1 − f ( x ( i ) ; θ ))) X N i • For classification problems with a softmax output layer. • Maximize log-probability of the correct class given an input 38

Binary Cross Entropy Loss Input Predicted Actual h 0 [ [ [ x 0 0.05 1 [-20, 45], 0.02 0 h 1 o 0 [80, 0], 0.96 1 x 1 [4, 15], 0.35 1 [45, 60], h 2 ] ] ] J MSE ( θ ) = 1 f ( x ( i ) ; θ ) − y ( i ) ⌘ 2 ⇣ X N i 39

Training Neural Networks 40

�� Training �� X 1 l ( f ( x ( t ) ; θ ) , y ( t ) ) + λ Ω ( θ ) arg min = T θ t �� Loss function Regularizer �� • Learning is cast as optimization. — For classification problems, we would like to minimize classification error �� — Loss function can sometimes be viewed as a surrogate for what we want to optimize (e.g. upper bound) 41

Loss is a function of the model’s parameters �� 42

How to minimize loss? �� Start at random point �� 43

How to minimize loss? Compute: 44

How to minimize loss? Move in direction opposite of gradient to new point 45

How to minimize loss? Move in direction opposite of gradient to new point 46

How to minimize loss? Repeat! 47

This is called Stochastic Gradient Descent (SGD) Repeat! 48

𝜄 �𝑢+1� � 𝜄 �𝑢� � 𝜃 𝑢 𝛼 𝜄 ℒ Stochastic Gradient Descent (SGD) �� ◦ • Initialize θ randomly ◦ � �� • For N Epochs �� • For each training example ( x , y ) : �� • Compute Loss Gradient: � �� • Update θ with update rule: �� 49

𝜄 �𝑢+1� � 𝜄 �𝑢� � 𝜃 𝑢 𝛼 𝜄 ℒ Why is it Stochastic Gradient Descent? ◦ • Initialize θ randomly ◦ Only an estimate of �� • For N Epochs true gradient! • For each training example ( x , y ) : �� • Compute Loss Gradient: � �� • Update θ with update rule: �� 50

𝜄 �𝑢+1� � 𝜄 �𝑢� � 𝜃 𝑢 𝛼 𝜄 ℒ Why is it Stochastic Gradient Descent? ◦ �� • Initialize θ randomly ◦ More accurate �� • For N Epochs estimate! �� • For each training batch {( x 0 , y 0 ),…, ( x B , y B )} : � �� • Compute Loss Gradient: � �� • Update θ with update rule: � �� Advantages: More accurate estimation of gradient • Smoother convergence ⎯ ⎯ Allows for larger learning rates Minibatches lead to fast training! • Can parallelize computation + achieve ⎯ �� significant speed increases on GPU’s 51

𝜄 �𝑢+1� � 𝜄 �𝑢� � 𝜃 𝑢 𝛼 𝜄 ℒ Stochastic Gradient Descent (SGD) ◦ • ◦ tions • Algorithm that performs updates after each example ze : • θ ⌘ { W (1) , b (1) , . . . , W ( L +1) , b ( L +1) } — initialize • �r — for N iterations - • • ( x ( t ) , y ( t ) ) mple — for each training example or batch r 8 P r Training epoch Training epoch • � • ∆ = �r θ l ( f ( x ( t ) ; θ ) , y ( t ) ) � λ r θ Ω ( θ ) θ ) = = Iteration over all examples Iteration of all examples • θ θ + α ∆ • To apply this algorithm to neural network training, we need: ion: • l ( f ( x ( t ) ; θ ) , y ( t ) ) — the loss function — a procedure to compute the parameter gradients: r — the regularizer ent: , (and the gradient ) , • Ω ( θ ) • r θ Ω ( θ ) 52

𝜄 �𝑢+1� � 𝜄 �𝑢� � 𝜃 𝑢 𝛼 𝜄 ℒ Stochastic Gradient Descent (SGD) ◦ • ◦ tions • Algorithm that performs updates after each example ze : • θ ⌘ { W (1) , b (1) , . . . , W ( L +1) , b ( L +1) } — initialize • �r — for N iterations - • • ( x ( t ) , y ( t ) ) mple — for each training example or batch r 8 P r Training epoch Training epoch • � • ∆ = �r θ l ( f ( x ( t ) ; θ ) , y ( t ) ) � λ r θ Ω ( θ ) θ ) = = Iteration over all examples Iteration of all examples • θ θ + α ∆ • To apply this algorithm to neural network training, we need: ion: • l ( f ( x ( t ) ; θ ) , y ( t ) ) — the loss function — a procedure to compute the parameter gradients: r — the regularizer ent: , (and the gradient ) , • Ω ( θ ) • r θ Ω ( θ ) 53

What is a neural network again? • A family of parametric, non-linear and hierarchical representation learning functions a L ( x ; θ 1 ,...,L ) = h L ( h L − 1 ( . . . h 1 ( x, θ 1 ) , θ L − 1 ) , θ L ) • ⎯ x : input, θ l : parameters for layer l , a l = h l ( x , θ l ) : (non-)linear function • Given training corpus { X , Y } find optimal parameters X ✓ ∗ ← arg min ` ( y, a L ( x ; ✓ 1 ,...,L )) θ ( x,y ) ∈ ( X,Y ) 54

Neural network models • A neural network model is a series of hierarchically connected functions • The hierarchy can be very, very complex Forward connections (Feedforward architecture) h 4 ( x i ; θ ) h 1 ( x i ; θ ) h 3 ( x i ; θ ) h 5 ( x i ; θ ) h 2 ( x i ; θ ) Loss Input 55

Neural network models • A neural network model is a series of hierarchically connected functions • The hierarchy can be very, very complex Loss h 5 ( x i ; θ ) h 4 ( x i ; θ ) h 4 ( x i ; θ ) Interweaved connections (Directed Acyclic Graph architecture – DAGNN) h 3 ( x i ; θ ) h 2 ( x i ; θ ) h 2 ( x i ; θ ) h 1 ( x i ; θ ) Input 56

Neural network models • A neural network model is a series of hierarchically connected functions • The hierarchy can be very, very complex h 4 ( x i ; θ ) h 3 ( x i ; θ ) h 5 ( x i ; θ ) h 1 ( x i ; θ ) h 2 ( x i ; θ ) Loss Input Loopy connections (Recurrent architecture, special care needed) 57

Neural network models • A neural network model is a series of hierarchically connected functions • The hierarchy can be very, very complex Loss h 4 ( x i ; θ ) h 1 ( x i ; θ ) h 3 ( x i ; θ ) h 5 ( x i ; θ ) h 2 ( x i ; θ ) Input Loss h 5 ( x i ; θ ) h 4 ( x i ; θ ) h 4 ( x i ; θ ) Functions → Modules h 3 ( x i ; θ ) h 2 ( x i ; θ ) h 2 ( x i ; θ ) h 4 ( x i ; θ ) h 1 ( x i ; θ ) h 3 ( x i ; θ ) h 5 ( x i ; θ ) h 2 ( x i ; θ ) Input Loss h 1 ( x i ; θ ) Input 58

What is a module • A module is a building block for our network Loss • Each module is an object/function 𝑏 = h ( x ; 𝜄 ) that h 5 ( x i ; θ ) h 4 ( x i ; θ ) ⎯ Contains trainable parameters ( 𝜄 ) ⎯ Receives as an argument an input 𝑦 h 4 ( x i ; θ ) ⎯ And returns an output 𝑏 based on the activation function h (...) • The activation function should be (at least) first order h 3 ( x i ; θ ) differentiable (almost) everywhere h 2 ( x i ; θ ) h 2 ( x i ; θ ) • For easier/more efficient backpropagation → store module input h 1 ( x i ; θ ) ⎯ easy to get module output fast ⎯ easy to compute derivatives Input 59

Anything goes or do special constraints exist? • A neural network is a composition of modules (building blocks) • Any architecture works • If the architecture is a feedforward cascade, no special care • If acyclic, there is right order of computing the forward computations • If there are loops, these form recurrent connections (revisited later) 60

What is a module • Simply compute the activation of each module in the Loss network a l = h l ( x l ; θ ) where or a l = x l +1 x l = a l − 1 h 5 ( x i ; θ ) h 4 ( x i ; θ ) • We need to know the precise function behind each h 4 ( x i ; θ ) module h 𝑚 (... ) h 3 ( x i ; θ ) • Recursive operations • One module’s output is another’s input h 2 ( x i ; θ ) h 2 ( x i ; θ ) • Steps h 1 ( x i ; θ ) • Visit modules one by one starting from the data input • Some modules might have several inputs from multiple modules Input • Compute modules activations with the right order • Make sure all the inputs computed at the right time 61

What is a module • Simply compute the gradients of each module dLoss ( Input ) for our data • We need to know the gradient formulation of each h 5 ( x i ; θ ) h 4 ( x i ; θ ) module 𝜖 h 𝑚 ( 𝑦 𝑚 ; 𝜄 𝑚 ) w.r.t. their inputs 𝑦 𝑚 and parameters 𝜄 𝑚 h 4 ( x i ; θ ) • We need the forward computations first • Their result is the sum of losses for our input data h 3 ( x i ; θ ) • Then take the reverse network (reverse connections) and traverse it backwards h 2 ( x i ; θ ) h 2 ( x i ; θ ) • Instead of using the activation functions, we use h 1 ( x i ; θ ) their gradients • The whole process can be described very neatly and concisely with the backpropagation algorithm 62

Again, what is a neural network again? • d a L ( x ; θ 1 ,...,L ) = h L ( h L − 1 ( . . . h 1 ( x, θ 1 ) , θ L − 1 ) , θ L ) ⎯ x : input, θ l : parameters for layer l , a l = h l ( x , θ l ) : (non-)linear function • Given training corpus { X , Y } find optimal parameters X ✓ ∗ ← arg min ` ( y, a L ( x ; ✓ 1 ,...,L )) θ ( x,y ) ∈ ( X,Y ) ✓ ◆ ∂ L • To use any gradient descent based optimization θ t +1 = θ t − η t ∂θ t we need the gradients ∂ L , l = 1 , . . . , L ∂θ l • How to compute the gradients for such a complicated function enclosing other functions, like 𝑏 𝑀 (... ) ? 63

Again, what is a neural network again? • d a L ( x ; θ 1 ,...,L ) = h L ( h L − 1 ( . . . h 1 ( x, θ 1 ) , θ L − 1 ) , θ L ) ⎯ x : input, θ l : parameters for layer l , a l = h l ( x , θ l ) : (non-)linear function • Given training corpus { X , Y } find optimal parameters X ✓ ∗ ← arg min ` ( y, a L ( x ; ✓ 1 ,...,L )) θ ( x,y ) ∈ ( X,Y ) ✓ ◆ ∂ L • To use any gradient descent based optimization θ t +1 = θ t − η t ∂θ t we need the gradients ∂ L , l = 1 , . . . , L ∂θ l • How to co compute the grad adients s for su such ch a a co complicat cated funct ction encl closi sing other funct ctions, s, like ke 𝑏 𝑀 (... ) ? ? 64

How do we compute gradients? • Numerical Differentiation • Symbolic Differentiation • Automatic Differentiation (AutoDiff) 65

1 i - Vector of all zeros, except for one 1 in i-th location Numerical Differentiation • We can approximate the gradient numerically, using: ∂ f ( x ) f ( x + h 1 i ) − f ( x ) ≈ = lim ∂ x i h h → 0 slide adopted from T. Chen, H. Shen, A. Krishnamurthy 66

1 i - Vector of all zeros, except for one 1 in i-th location Numerical Differentiation • We can approximate the gradient numerically, using: ∂ f ( x ) f ( x + h 1 i ) − f ( x ) ≈ = lim ∂ x i h h → 0 • Even better, we can use central differencing: ∂ f ( x ) f ( x + h 1 i ) − f ( x − h 1 i ) ≈ = lim 2 h ∂ x i h → 0 slide adopted from T. Chen, H. Shen, A. Krishnamurthy 67

1 i - Vector of all zeros, except for one 1 in i-th location Numerical Differentiation • We can approximate the gradient numerically, using: ∂ f ( x ) f ( x + h 1 i ) − f ( x ) ≈ = lim ∂ x i h h → 0 • Even better, we can use central differencing: ∂ f ( x ) f ( x + h 1 i ) − f ( x − h 1 i ) ≈ = lim 2 h ∂ x i h → 0 • However, both of these suffer from rounding errors and are not good enough for learning (they are very good tools for checking the correctness of implementation though, e.g., use h = 0.000001 ). slide adopted from T. Chen, H. Shen, A. Krishnamurthy 68

1 i - Vector of all zeros, except for one 1 in i-th location Numerical Differentiation 1 ij - Matrix of all zeros, except for one 1 in (i,j)-th location • We can approximate the gradient numerically, using: ∂ L ( W , b ) L ( W + h 1 ij , b ) − L ( W , b ) ∂ L ( W , b ) L ( W , b + h 1 j ) − L ( W , b ) ≈ lim ≈ lim ∂ w ij ∂ b j h h h → 0 h → 0 • Even better, we can use central differencing: ∂ L ( W , b ) L ( W + h 1 ij , b ) − L ( W + h 1 ij , b ) ∂ L ( W , b ) L ( W , b + h 1 j ) − L ( W , b + h 1 j ) ≈ lim ≈ lim ∂ w ij 2 h 2 h ∂ b j h → 0 h → 0 • However, both of these suffer from rounding errors and are not good enough for learning (they are very good tools for checking the correctness of implementation though, e.g., use h = 0.000001 ). slide adopted from T. Chen, H. Shen, A. Krishnamurthy 69

y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Symbolic Differentiation • Input function is represented as co computational gr graph ph (a symbolic tree) v 2 x 1 ln ln x 1 + v 0 v 5 v 2 + v 5 v 3 + + v 3 − sin v 1 − v 4 x 2 v 6 y x 2 v 4 y sin Implements differentiation rules for composite functions: • Implements differentiation rules for composite functions: Sum Rule Product Rule Chain Rule d ( f ( x ) + g ( x )) = d f ( x ) + d g ( x ) d ( f ( x ) · g ( x )) = d f ( x ) g ( x ) + f ( x )d g ( x ) d( f ( g ( x ))) = d f ( g ( x )) · d g ( x ) d x d x d x d x d x d x d x d x d x Pr Proble oblem: m: For complex functions, expressions can be exponentially large; also difficult to deal with piece-wise functions (creates many symbolic cases) slide adopted from T. Chen, H. Shen, A. Krishnamurthy 70

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ion: Interleave symbolic differentiation and simplification • In Intuit itio : Apply symbolic differentiation at the elementary • Ke Key Id Idea: operation level, evaluate and keep intermediate results Success of de learning owes A LOT to success of AutoDiff algorithms deep learning (also to advances in parallel architectures, and large datasets, ...) slide adopted from T. Chen, H. Shen, A. Krishnamurthy 71

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 2 v 5 + + v 3 − v 1 v 4 x 2 v 6 y sin • Each node node is an input, intermediate, or output variable • Computat ational al grap aph (a DAG) with variable ordering from topological sort. slide adopted from T. Chen, H. Shen, A. Krishnamurthy 72

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 2 v 5 + Computational graph is governed by these + v 3 equations: v 0 = x 1 − v 1 v 4 x 2 v 6 y sin v 1 = x 2 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 slide adopted from T. Chen, H. Shen, A. Krishnamurthy 73

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Computational graph is governed by these + v 3 equations: v 0 = x 1 − v 1 v 4 x 2 v 6 y sin v 1 = x 2 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 slide adopted from T. Chen, H. Shen, A. Krishnamurthy 74

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y v 0 = x 1 sin v 1 = x 2 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 75

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin v 1 = x 2 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 76

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 77

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 78

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output 2 x 5 = 10 v 3 = v 0 · v 1 variable v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 79

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output 2 x 5 = 10 v 3 = v 0 · v 1 variable sin(5) = -0.959 v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 80

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output 2 x 5 = 10 v 3 = v 0 · v 1 variable sin(5) = -0.959 v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable 0.693 + 10 = 10.693 v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 y = v 6 81

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output 2 x 5 = 10 v 3 = v 0 · v 1 variable sin(5) = -0.959 v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable 0.693 + 10 = 10.693 v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 y = v 6 82

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) ln x 1 v 0 v 2 v 5 + Forwar ard Eval valuat ation Trace: + v 3 f (2 , 5) − v 1 v 4 x 2 v 6 y 2 v 0 = x 1 sin 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) • Each node node is an input, intermediate, or output 2 x 5 = 10 v 3 = v 0 · v 1 variable sin(5) = -0.959 v 4 = sin ( v 1 ) • Computat ational al grap aph (a DAG) with variable 0.693 + 10 = 10.693 v 5 = v 2 + v 3 ordering from topological sort. v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 83

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 + + v 3 − v 1 v 4 x 2 y v 6 sin Forwar ard Eval valuat ation Trace: f (2 , 5) 2 v 0 = x 1 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 84

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 + Lets see how we can evalu evaluat ate a a funct ction using computational graph (DNN inferences) + v 3 − v 1 v 4 x 2 y � v 6 ∂ f ( x 1 , x 2 ) sin � � ∂ x 1 Forwar ard Eval valuat ation Trace: � ( x 1 =2 ,x 2 =5) f (2 , 5) We will do this with for forwa ward mo mode first, 2 v 0 = x 1 by introducing a derivative of each variable 5 v 1 = x 2 node with respect to the input variable. ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 85

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v f (2 , 5) 2 v 0 = x 1 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 86

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v f (2 , 5) 2 v 0 = x 1 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 87

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v 1 ∂ v ∂ x 1 ∂ x 1 f (2 , 5) ∂ v 2 v 0 = x 1 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 88

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v 1 ∂ v 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v 2 v 0 = x 1 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 89

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v 1 ∂ v 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = ∂ x 1 2 v 0 = x 1 1 ∂ v 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 90

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = ∂ x 1 2 v 0 = x 1 1 Chain Rule ∂ v 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 91

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v 1 ∂ v 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 Chain Rule 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 92

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 Chain Rule 5 v 1 = x 2 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 93

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 1 0 1 ∂ v 3 = ∂ v 0 5 v 1 = x 2 · v 1 ∂ x 1 ∂ x 1 ln(2) = 0.693 v 2 = ln( v 0 ) 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 94

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v 1 ∂ v 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 1 0 1 ∂ v 3 = ∂ v 0 5 v 1 = x 2 · v 1 ∂ x 1 ∂ x 1 ln(2) = 0.693 v 2 = ln( v 0 ) Product Rule 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 95

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 1 0 1 ∂ v 3 = ∂ v 0 · v 1 + v 0 · ∂ v 1 5 v 1 = x 2 ∂ x 1 ∂ x 1 ∂ x 1 ln(2) = 0.693 v 2 = ln( v 0 ) Product Rule 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 96

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v 1 ∂ v 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 1 0 1 ∂ v 3 = ∂ v 0 · v 1 + v 0 · ∂ v 1 1*5 + 2*0 = 5 5 v 1 = x 2 ∂ x 1 ∂ x 1 ∂ x 1 ln(2) = 0.693 v 2 = ln( v 0 ) Product Rule 2 x 5 = 10 v 3 = v 0 · v 1 sin(5) = -0.959 v 4 = sin ( v 1 ) v 5 = v 2 + v 3 0.693 + 10 = 10.693 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 11.652 y = v 6 97

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 Forwar ard Eval valuat ation Trace: ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 f (2 , 5) ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 ∂ x 1 ∂ x 1 2 v 0 v 0 = x 1 1 0 1 ∂ v 3 = ∂ v 0 · v 1 + v 0 · ∂ v 1 1*5 + 2*0 = 5 5 v 1 = x 2 ∂ x 1 ∂ x 1 ∂ x 1 ln(2) = 0.693 v 2 = ln( v 0 ) ∂ v 4 = ∂ v 1 0 * cos(5) = 0 cos ( v 1 ) 2 x 5 = 10 v 3 = v 0 · v 1 ∂ x 1 ∂ x 1 1 1 sin(5) = -0.959 ∂ v 5 = ∂ v 2 + ∂ v 3 v 4 = sin ( v 1 ) 0.5 + 5 = 5.5 ∂ x 1 ∂ x 1 ∂ x 1 v 5 = v 2 + v 3 0.693 + 10 = 10.693 1 1 1 ∂ v 6 = ∂ v 5 − ∂ v 4 5.5 – 0 = 5.5 v 6 = v 5 − v 4 10.693 + 0.959 = 11.652 ∂ x 1 ∂ x 1 ∂ x 1 1 1 1 11.652 y = v 6 ∂ y = ∂ v 6 5.5 ∂ x 1 ∂ x 1 98

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 We now have: ∂ x 1 ∂ x 1 v 0 1 0 1 ∂ v 3 = ∂ v 0 · v 1 + v 0 · ∂ v 1 � ∂ f ( x 1 , x 2 ) 1*5 + 2*0 = 5 � = 5 . 5 ∂ x 1 ∂ x 1 ∂ x 1 � ∂ x 1 � ( x 1 =2 ,x 2 =5) ∂ v 4 = ∂ v 1 0 * cos(5) = 0 cos ( v 1 ) ∂ x 1 ∂ x 1 1 1 ∂ v 5 = ∂ v 2 + ∂ v 3 0.5 + 5 = 5.5 ∂ x 1 ∂ x 1 ∂ x 1 1 1 1 ∂ v 6 = ∂ v 5 − ∂ v 4 5.5 – 0 = 5.5 ∂ x 1 ∂ x 1 ∂ x 1 1 1 1 ∂ y = ∂ v 6 5.5 ∂ x 1 ∂ x 1 99

Automatic Differentiation (AutoDiff) y = f ( x 1 , x 2 ) = ln( x 1 ) + x 1 x 2 − sin ( x 2 ) ln x 1 v 0 v 5 v 2 ve Trace: + Forwar ard Derivat vative Forward Derivative Trace: � ∂ f ( x 1 , x 2 ) � + v 3 � ∂ x 1 � ( x 1 =2 ,x 2 =5) − v 1 v 4 ∂ v 0 x 2 y v 6 1 sin ∂ x 1 1 ∂ v ∂ v 1 0 ∂ x 1 ∂ x 1 ∂ v ∂ v 2 = 1 ∂ v 0 1/2 * 1 = 0.5 We now have: ∂ x 1 ∂ x 1 v 0 1 0 1 ∂ v 3 = ∂ v 0 · v 1 + v 0 · ∂ v 1 � ∂ f ( x 1 , x 2 ) 1*5 + 2*0 = 5 � = 5 . 5 ∂ x 1 ∂ x 1 ∂ x 1 � ∂ x 1 � ( x 1 =2 ,x 2 =5) ∂ v 4 = ∂ v 1 0 * cos(5) = 0 cos ( v 1 ) ∂ x 1 ∂ x 1 Still need: 1 1 ∂ v 5 = ∂ v 2 + ∂ v 3 0.5 + 5 = 5.5 ∂ x 1 ∂ x 1 ∂ x 1 � ∂ f ( x 1 , x 2 ) 1 1 1 � ∂ v 6 = ∂ v 5 − ∂ v 4 5.5 – 0 = 5.5 � ∂ x 2 � ∂ x 1 ∂ x 1 ∂ x 1 ( x 1 =2 ,x 2 =5) 1 1 1 ∂ y = ∂ v 6 5.5 ∂ x 1 ∂ x 1 100

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut - PowerPoint PPT Presentation

Image: Jose-Luis Olivares CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University // Spring 2020 Breaking news! Practical 1 is out! Learning neural word embeddings Due Friday, Mar. 26,

CMP784 DEEP LEARNING Lecture #12 Deep Reinforcement Learning Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #11 Variational Autoencoders Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

HOT COMMUNITY GRANTS KEY OBJECTIVES Encourage leadership Help communities share Broaden the

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal

WHAT WILL YOU WRITE? OST STUDENT STAFF INFO SESSIONS FALL 2020 OVERVIEW Office of Student

A Lagrangian Heuristic for Robust Train Timetabling 2 1 1 Valentina Cacchiani, Alberto

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Neutron emission asymmetries from linearly polarized rays on nat Cd, nat Sn, and 181 Ta Clarke

(1) status of publications - peer reviewed journals pending papers from phase 1 - conference

S OME H OUSEKEEPING I TEMS ( CONTINUED ) To ask a question Select the Questions pane on

Opportunities for Heavy Element Science with ReA Cody Folden Cyclotron Institute, Texas A&M

Concurrent Collections: Fusion and Tiling Chenyang Liu, Milind Kulkarni Computer Engineering

Neutrino beams NGA CDT school, Huddersfield, March 2012 Main neutrinos sources Nuclear fusion

Role of fission in r -process nucleosynthesis Samuel A. Giuliani NSCL/FRIB, East Lansing July

Learning Dynamics with Synchronous, Asynchronous and General Semantics Tony Ribeiro 1 , Maxime

Mul$modal Interfaces Shiri Azenkot May 29, 2013 LNG 575

Test ANTARES: Recent results Haga clic para modificar el estilo de subttulo del patrn J.J.

From Milliwatts Milliwatts to Megawatts: to Megawatts: From The System The System- -Level

rr r tr stt

Icelandic Salmon Q3 2020 Q3 2019 YTD 2020 YTD 2019 1 As previously stated, weak results in Q3

Sambuz

Useful Links

Newsletter

Mail Us

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut - PowerPoint PPT Presentation

Image: Jose-Luis Olivares CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University // Spring 2020 Breaking news! Practical 1 is out! Learning neural word embeddings Due Friday, Mar. 26,

CMP784 DEEP LEARNING Lecture #12 Deep Reinforcement Learning Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #11 Variational Autoencoders Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #12 Self-Supervised Learning Aykut Erdem // Hacettepe

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

CMP784 DEEP LEARNING Lecture #08 Attention and Memory Aykut Erdem // Hacettepe University //

HOT COMMUNITY GRANTS KEY OBJECTIVES Encourage leadership Help communities share Broaden the

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal

WHAT WILL YOU WRITE? OST STUDENT STAFF INFO SESSIONS FALL 2020 OVERVIEW Office of Student

A Lagrangian Heuristic for Robust Train Timetabling 2 1 1 Valentina Cacchiani, Alberto

Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October

Neutron emission asymmetries from linearly polarized rays on nat Cd, nat Sn, and 181 Ta Clarke

(1) status of publications - peer reviewed journals pending papers from phase 1 - conference

S OME H OUSEKEEPING I TEMS ( CONTINUED ) To ask a question Select the Questions pane on

Opportunities for Heavy Element Science with ReA Cody Folden Cyclotron Institute, Texas A&amp;M

Concurrent Collections: Fusion and Tiling Chenyang Liu, Milind Kulkarni Computer Engineering

Neutrino beams NGA CDT school, Huddersfield, March 2012 Main neutrinos sources Nuclear fusion

Role of fission in r -process nucleosynthesis Samuel A. Giuliani NSCL/FRIB, East Lansing July

Learning Dynamics with Synchronous, Asynchronous and General Semantics Tony Ribeiro 1 , Maxime

Mul$modal Interfaces Shiri Azenkot May 29, 2013 LNG 575

Test ANTARES: Recent results Haga clic para modificar el estilo de subttulo del patrn J.J.

From Milliwatts Milliwatts to Megawatts: to Megawatts: From The System The System- -Level

rr r tr stt

Icelandic Salmon Q3 2020 Q3 2019 YTD 2020 YTD 2019 1 As previously stated, weak results in Q3

Sambuz

Useful Links

Newsletter

Mail Us

Opportunities for Heavy Element Science with ReA Cody Folden Cyclotron Institute, Texas A&M