AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois - PowerPoint PPT Presentation

AMMI – Introduction to Deep Learning 4.1. DAG networks Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11

Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × can be generalized to an arbitrary “Directed Acyclic Graph” (DAG) of operators w (1) φ (1) φ (3) f ( x ) φ (2) x w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11

Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ b 1 ∂ b R � . .  ...  . . = J φ =  .   . .   ∂ b  ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11

Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ b 1 ∂ b R � . .  ...  . . = J φ =  .   . .   ∂ b  ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Also, if ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R , c 1 , . . . , c S ), we use  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ c 1 ∂ c S � . .  ...  . . = J φ | c =  .   . . ∂ c    ∂ a Q ∂ a Q . . . ∂ c 1 ∂ c S Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11

Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) f ( x ) = x (3) = φ (3) ( x (1) , x (2) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (2) � � � � � = + = J φ (1) | x (0) + J φ (2) | x (0) ∂ x (0) ∂ x (0) ∂ x (1) ∂ x (0) ∂ x (2) ∂ x (1) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (2) � � � = = J φ (2) | w (2) ∂ w (2) ∂ w (2) ∂ x (2) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11

So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , we can build an arbitrary directed acyclic graph with these operators at the nodes, compute the response of the resulting mapping, and compute its gradient with back-prop. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11

Writing from scratch a large neural network is complex and error-prone. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley One approach is to define the nodes and edges of such a DAG statically (Torch, TensorFlow, Caffe, Theano, etc.) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11

In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) φ (1) � x (0) ; w (1) � = w (1) x (0) φ (2) � x (0) , x (1) ; w (2) � = x (0) + w (2) x (1) φ (3) � x (1) , x (2) ; w (1) � = w (1) � x (1) + x (2) � Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Everything we have seen for an MLP w (1) b (1) w (2) b (2)

AMMI Introduction to Deep Learning 6.5. Residual networks Fran cois Fleuret

AMMI Introduction to Deep Learning 11.3. Word embeddings and translation Fran cois Fleuret

AMMI Introduction to Deep Learning 8.4. Optimizing inputs Fran cois Fleuret

AMMI Introduction to Deep Learning 7.2. Networks for image classification Fran cois

AMMI Introduction to Deep Learning 7.3. Networks for object detection Fran cois Fleuret

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

AMMI Introduction to Deep Learning 9.1. Transposed convolutions Fran cois Fleuret

AMMI Introduction to Deep Learning 6.6. Using GPUs Fran cois Fleuret

CSE 421 Longest Path in a DAG, LIS, Shortest Path with Negative Weights Shayan Oveis Gharan 1

AMMI Introduction to Deep Learning 11.2. LSTM and GRU Fran cois Fleuret

AMMI Introduction to Deep Learning 1.3. What is really happening? Fran cois Fleuret

AMMI Introduction to Deep Learning 8.3. Visualizing the processing in the input Fran cois

AMMI Introduction to Deep Learning 10.4. Model persistence and checkpoints Fran cois

AMMI Introduction to Deep Learning 1.2. Current applications and success Fran cois Fleuret

AMMI Introduction to Deep Learning 8.2. Looking at activations Fran cois Fleuret

MA/CSSE 473 Day 13 Finish Topological Sort Permutation Generation MA/CSSE 473 Day 13

The Algebra of DAGs Marcelo Fiore Computer Laboratory University of Cambridge Samson@60

Nick Schrock Founder, Elementl @schrockn Our data is totally broken Our data is

CAUSAL DISCOVERY CAUSAL DISCOVERY Beware of the DAG! Beware of the DAG! Philip Dawid

Digital Logic Design: a rigorous approach c Chapter 4: Directed Graphs Guy Even Moti Medina

Tracking P4 Program Execution in the Data Plane SOSR 20 Suriya Kodeswaran Mina Arashloo,

Learning in Bayes Nets Bayes Nets: 1. Parameter Learning/Estimation: infer from data, given G

How to Write Fast Numerical Code Spring 2011 Lecture 22 Instructor: Markus Pschel TA: Georg