Abstractions and Frameworks for Deep Learning: a Discussion - PowerPoint PPT Presentation

Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al. Rémi Emonet Saintélyon Deep Learning Workshop − 2015-11-26

Disclaimer

Introductory Poll Did you ever use? Caffe Theano Lasagne Torch TensorFlow Other? Any experience to share? 3 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Overview Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions?

Overview Deep Learning? Abstraction in Frameworks A Tour of Existing Framework More Discussions? 5 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a Function (supervised) Notations Input i Output o Function f given Parameters θ to be learned We suppose: o = f ( i ) θ How to optimize it: how to find the best θ ? need some regularity assumptions usually, at least differentiability Remark: a more generic view o = f ( i ) = f ( θ , i ) θ 6 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Gradient Descent We want to find the best parameters we suppose: o = f ( i ) θ n n we have examples of inputs i and target output t ∑ n n we want to minimize the sum of errors L ( θ ) = L ( f ( i ), t ) θ n we suppose f and L are differentiable Gradient descent (gradient = vector of partial derivatives) 0 start with a random θ t +1 t = θ − γ ∇ L ( θ ) compute the gradient and update θ θ Variations stochastic gradient descent (SGD) conjugate gradient descent BFGS L-BFGS ... 7 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a “Deep” Function Idea f is a composition of functions 2 1 2 layers: o = f ( i ) = f ( f ( i )) θ 2 θ 1 θ 3 2 1 3 layers: o = f ( i ) = f ( f ( f ( i ))) θ 3 θ 2 θ 1 θ 3 2 1 K layers: o = f ( i ) = f K (... f ( f ( f ( i )))...) θ θ 3 θ 2 θ 1 θ K with all f differentiable l How can we optimize it? The chain rule! Many versions (with F = f ∘ g ) ′ ′ ′ ( f ∘ g ) = ( f ∘ g ) ⋅ g ′ ′ ′ F ( x ) = f ( g ( x )) g ( x ) df df dg = ⋅ dx dg dx 8 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a “Deep” Function 3 2 1 Reminders: K layers: o = f ( i ) = f K (... f ( f ( f ( i )))...) θ θ 3 θ 2 θ 1 θ K ∑ minimize the sum of errors L ( θ ) = L ( f ( i ), t ) n n θ n df df dg = ⋅ chain rule dx dg dx Goal: compute ∇ L for gradient descent θ df K dL dL ∇ L = = θ K df K d θ K d θ K df K −1 df K dL dL ∇ L = = θ K −1 df K −1 df K d θ K −1 d θ K −1 df 2 df 1 df K dL dL ∇ L = = ⋯ θ 1 df K −1 df 1 df K d θ 1 d θ 1 dL : gradient of the loss with respect to its input ✔ df K df k : gradient of a function with respect to its input ✔ df k −1 df k : gradient of a function with respect to its parameters ✔ d θ k 9 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Deep Learning and Composite Functions Deep Learning? NN can be deep, CNN can be deep “any” composition of differentiable function can be optimized with gradient descent some other models are also deep... (hierarchical models, etc) 3 2 1 Evaluating a composition f ( i ) = f K (... f ( f ( f ( i )))...) θ 3 θ 2 θ 1 θ θ K “forward pass” evaluate successively each function Computing the gradient ∇ L (for gradient descent) θ compute the input ($o) gradient (from the output error) for each f , f , ... 1 2 compute the parameter gradient (from the output gradient) compute the input gradient (from the output gradient) 10 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Back to “seeing parameters as inputs” k Parameters ( θ ) Just another input of f k Can be rewritten, e.g. as f ( θ , x ) k k More generic inputs can be constant inputs can be parameters inputs can be produced by another function (e.g. f ( g ( x ), h ( x )) ) 11 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Function/Operator/Layer The functions that we can use for f k Many choices fully connected layers convolutions layers activation functions (element-wise) soft-max pooling ... Loss Functions: same with no parameters In the wild Torch module Theano operator 13 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Data/Blob/Tensor The data: input, intermediate result, parameters, gradient, ... Usually a tensor (n-dimensional matrices) In the wild Torch tensor Theano tensor, scalars, numpy arrays 14 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Contenders Caffe Torch Theano Lasagne Tensor Flow Deeplearning4j ... 16 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Overview Basics install CUDA/Cublas/OpenBlas blob/tensors, blocks/layers/loss, parameters cuDNN open source Control flow define a composite function (graph) choice of an optimizer forward, backward Extend write a new operator/module "forward" "backward": gradParam, gradInput 17 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Caffe "made with expression, speed, and modularity in mind" "developed by the Berkeley Vision and Learning Center (BVLC)" "released under the BSD 2-Clause license" C++ layers-oriented http://caffe.berkeleyvision.org/tutorial /layers.html plaintext protocol buffer schema (prototxt) to describe models (and so save them too) 1,068 / 7,184 / 4,077 18 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Torch7 By Ronan Collobert (Idiap, now Facebook) Clement Farabet (NYU, now Madbits now Twitter) Koray Kavukcuoglu (Google DeepMind) Lua (+ C) need to learn easy to embed Layer-oriented easy to use difficult to extend, sometimes (merging sources) 418 / 3,267 / 757 19 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Theano "is a Python library" "allows you to define, optimize, and evaluate mathematical expressions" "involving multi-dimensional arrays" "efficient symbolic differentiation" "transparent use of a GPU" "dynamic C code generation" Use symbolic expressions: reasoning on the graph write numpy-like code no forced “layered” architecture computation graph 263 / 2,447 / 878 20 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Lasagne (Keras, etc) Overlay to Theano Provide layer API close to caffe/torch etc Layer-oriented 133 / 1,401 / 342 21 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Tensor Flow By Google, Nov. 2015 Selling points easy to move from a cluster to a mobile phone easy to distribute Currently slow? Not fully open yet? 1,303 / 13,232 / 3,375 22 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Deeplearning4j “Deep Learning for Java, Scala & Clojure on Hadoop, Spark & GPUs“ Apache 2.0-licensed Java High level (layer-oriented) Typed API 236 / 1,648 / 548 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Be creative! anything differentiable can be tried! 25 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

How to choose a framework? 26 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Any experience to share? 27 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Abstractions and Frameworks for Deep Learning: a Discussion - PowerPoint PPT Presentation

Abstractions and Frameworks for Deep Learning: a Discussion Caffe, Torch, Theano, TensorFlow, et al. Rmi Emonet Saintlyon Deep Learning Workshop 2015-11-26 Disclaimer Introductory Poll Did you ever use? Caffe Theano Lasagne

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Planning and Optimization D2. Abstractions: Additive Abstractions Gabriele R oger and Thomas

Automatically Deriving Abstraction Heuristics PDB Abstractions Explicit-State Abstractions

Abstractions for Routing Abstractions for Network Routing Brighten Godfrey Brighten Godfrey

Unified L2 Abstractions for L3-Driven Fast Handover draft-irtf-mobopts-l2-abstractions-01 F.

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Web Frameworks Web Frameworks Banned for homework assignments Now that you're starting

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Planning and Optimization D3. Abstractions: Additive Abstractions Malte Helmert and Thomas Keller

Planning and Optimization D3. Abstractions: Additive Abstractions Malte Helmert and Gabriele R

Resources, Services, and Interfaces Services: Hardware Abstractions CPU/Memory abstractions

Resources, Services, and Interfaces Services: Hardware Abstractions CPU/Memory abstractions

FRONTEND AT SCALE Designing abstractions for big teams @joshduck What front-end abstractions

L9: Frontend Abstractions Web Engineering 188.951 2VU SS20 Jrgen Cito L9: Frontend

Efficient Abstractions for GPGPU Programming . Mathias Bourgoin 10.03.2015 Efficient

Thomas Jefferson and Apple versus the FBI Daniel J. Bernstein University of Illinois at Chicago

Introductory Notes on Machine Translation and Deep Learning February 20, 2017 Jindich

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020 Recap Syntax

Tutoriel Deep Learning: applications signal Thomas Pellegrini Universit e de Toulouse; UPS;

The next galactic supenova with the IceCube observatory Blondin/ Lutz Kpke Mezzacappa Mainz

Process Mining Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010,

When the Exponent Matters Marwan Burelle - LSE Summer Week 2015 Do you think P-Time algorithms

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Sambuz

Useful Links

Newsletter

Mail Us

Abstractions and Frameworks for Deep Learning: a Discussion - PowerPoint PPT Presentation

Abstractions and Frameworks for Deep Learning: a Discussion Caffe, Torch, Theano, TensorFlow, et al. Rmi Emonet Saintlyon Deep Learning Workshop 2015-11-26 Disclaimer Introductory Poll Did you ever use? Caffe Theano Lasagne

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Planning and Optimization D2. Abstractions: Additive Abstractions Gabriele R oger and Thomas

Automatically Deriving Abstraction Heuristics PDB Abstractions Explicit-State Abstractions

Abstractions for Routing Abstractions for Network Routing Brighten Godfrey Brighten Godfrey

Unified L2 Abstractions for L3-Driven Fast Handover draft-irtf-mobopts-l2-abstractions-01 F.

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Web Frameworks Web Frameworks Banned for homework assignments Now that you're starting

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Planning and Optimization D3. Abstractions: Additive Abstractions Malte Helmert and Thomas Keller

Planning and Optimization D3. Abstractions: Additive Abstractions Malte Helmert and Gabriele R

Resources, Services, and Interfaces Services: Hardware Abstractions CPU/Memory abstractions

Resources, Services, and Interfaces Services: Hardware Abstractions CPU/Memory abstractions

FRONTEND AT SCALE Designing abstractions for big teams @joshduck What front-end abstractions

L9: Frontend Abstractions Web Engineering 188.951 2VU SS20 Jrgen Cito L9: Frontend

Efficient Abstractions for GPGPU Programming . Mathias Bourgoin 10.03.2015 Efficient

Thomas Jefferson and Apple versus the FBI Daniel J. Bernstein University of Illinois at Chicago

Introductory Notes on Machine Translation and Deep Learning February 20, 2017 Jindich

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020 Recap Syntax

Tutoriel Deep Learning: applications signal Thomas Pellegrini Universit e de Toulouse; UPS;

The next galactic supenova with the IceCube observatory Blondin/ Lutz Kpke Mezzacappa Mainz

Process Mining Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010,

When the Exponent Matters Marwan Burelle - LSE Summer Week 2015 Do you think P-Time algorithms

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning for natural language processing A short primer on deep learning Benoit Favre <