ammi introduction to deep learning 4 1 dag networks
play

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Everything we have seen for an MLP w (1) b (1) w (2) b (2)


  1. AMMI – Introduction to Deep Learning 4.1. DAG networks Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

  2. Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11

  3. Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × can be generalized to an arbitrary “Directed Acyclic Graph” (DAG) of operators w (1) φ (1) φ (3) f ( x ) φ (2) x w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11

  4. Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ b 1 ∂ b R � . .  ...  . . = J φ =  .   . .   ∂ b  ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11

  5. Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ b 1 ∂ b R � . .  ...  . . = J φ =  .   . .   ∂ b  ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Also, if ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R , c 1 , . . . , c S ), we use  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ c 1 ∂ c S � . .  ...  . . = J φ | c =  .   . . ∂ c    ∂ a Q ∂ a Q . . . ∂ c 1 ∂ c S Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11

  6. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  7. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  8. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  9. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  10. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) f ( x ) = x (3) = φ (3) ( x (1) , x (2) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  11. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  12. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  13. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  14. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (2) � � � � � = + = J φ (1) | x (0) + J φ (2) | x (0) ∂ x (0) ∂ x (0) ∂ x (1) ∂ x (0) ∂ x (2) ∂ x (1) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  15. Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

  16. Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

  17. Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (2) � � � = = J φ (2) | w (2) ∂ w (2) ∂ w (2) ∂ x (2) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

  18. So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11

  19. So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , we can build an arbitrary directed acyclic graph with these operators at the nodes, compute the response of the resulting mapping, and compute its gradient with back-prop. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11

  20. Writing from scratch a large neural network is complex and error-prone. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

  21. Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

  22. Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley One approach is to define the nodes and edges of such a DAG statically (Torch, TensorFlow, Caffe, Theano, etc.) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

  23. In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11

  24. In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) φ (1) � x (0) ; w (1) � = w (1) x (0) φ (2) � x (0) , x (1) ; w (2) � = x (0) + w (2) x (1) φ (3) � x (1) , x (2) ; w (1) � = w (1) � x (1) + x (2) � Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend