Myia: A Differentiable Language for Deep Learning Olivier Breuleux - PowerPoint PPT Presentation

Institut des algorithmes d’apprentissage de Montréal Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart van Merriënboer (MILA, Google) Arnaud Bergeron (MILA)

Paradigm How deep learning and language design intersect Autodiff What it is. How it works Myia Our proposed solution

Deep Learning (Elgammal & al., 2017)

Deep Learning Features + Composition + Learning Gradient descent (automatable) ? ? ? ? Data ? ? ? ? ? Initial structure Trained structure

Needs Goal: a language adapted to the needs of machine learning, past and future General purpose: Express complex compositions using control flow. Fast: Leverage parallelism and GPU to process millions of features. Portable: Serializable, support multiple hardware. Differentiable: Language support for gradient descent.

General purpose DL algorithms are increasingly complex …? Feedforward Recurrent Recursive (trivial) (loops) (recursion)

General purpose DL algorithms are increasingly complex More and more language features needed • Most existing frameworks are limited • • Awkward abstractions • No recursion High level abstraction increases productivity • • Focus on the algorithm over implementation details Effortless abstractions encourage their use •

Fast Scale to millions of parameters • Lots of parallel operations • • Matrix multiplication, map, reduce Can work with low precision (float32, 16, even 8 bits) • Leverage adapted hardware (GPU, TPU) • • Loop fusion • Automatic parallelization • Memory management

Portable Serializable models • • Code + parameters (data) Run on mobile and embedded systems • Seamless transfer from research to • production Avoid being tied to an ecosystem (e.g. • Python/numpy) e.g. ONNX (but more general) •

Differentiable

Paradigm How deep learning and language design intersect Autodiff What it is. How it works Myia Our proposed solution

<latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit> <latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit> <latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit> Derivative f ( x + ✏ ) − f ( x ) dxf ( x ) = d d f dx = f 0 ( x ) = lim ✏ ! 0 ✏

<latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit> <latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit> <latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit> Derivative f ( x + ✏ ) − f ( x ) dxf ( x ) = d d f dx = f 0 ( x ) = lim ✏ ! 0 ✏ Chain rule dxg ( f ( x )) = dg d d f d f dx Total derivative dxh ( f ( x ) , g ( x )) = ∂ h dx + ∂ h d d f d f ∂ f ∂ g dx

Myia: A Differentiable Language for Deep Learning Olivier Breuleux - PowerPoint PPT Presentation

Institut des algorithmes dapprentissage de Montral Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart van Merrinboer (MILA, Google) Arnaud Bergeron (MILA) Paradigm How deep learning and

Deep Learning with Myia Olivier Breuleux Research Developer, MILA Arnaud Bergeron (MILA) Bart van

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Engineering Tomorrow's Industrial Refrigeration Solutions in KSA Main driving factors for doing

MKS Integrity & CMMI July, 2007 Why the drive for CMMI? Missed commitments Spiralling

Quebec Precious Metals Corporation Ne New le lead ading gol old explo lorer, Ja James Bay y

of S&T activities Nanosciences and Nanotechnologies databases Lionel Villard, Michel Revollo

3-Coloring the Discrete Torus or Rigidity of zero temperature 3-states anti-ferromagnetic Potts

Independent Mental DATE: Capacity Advocacy 25/02/2020 DNACPR INFORMATION - for Health Care

Transforming Local Policing OFFICIAL SENSITIVE Sussex Local Policing

Economic Analysis for Determination of Excessive Share Limits for the Northeast Multispecies

Myia: A Differentiable Language for Deep Learning Olivier Breuleux - PowerPoint PPT Presentation

Institut des algorithmes dapprentissage de Montral Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart van Merrinboer (MILA, Google) Arnaud Bergeron (MILA) Paradigm How deep learning and

Deep Learning with Myia Olivier Breuleux Research Developer, MILA Arnaud Bergeron (MILA) Bart van

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

Learning to map between ferns with differentiable binary embedding networks Maximilian Blendowski

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Differentiable Rendering for Mesh and Implicit Surface Weikai Chen Tencent America GAMES

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

The Differentiable Curry Martin Abadi, Dan Belov, Gordon Plotkin, Richard Wei, Dimitrios Vytiniotis

Relay : a high level differentiable IR Jared Roesch TVMConf December 12th, 2018 1 This

Reparameterization Gradient for Non-differentiable Models Wonyeol Lee Hangyeol Yu Hongseok

Differentiable Cloth Simulation for Inverse Problems Junbang Liang 1 Content Motivation

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Engineering Tomorrow's Industrial Refrigeration Solutions in KSA Main driving factors for doing

MKS Integrity &amp; CMMI July, 2007 Why the drive for CMMI? Missed commitments Spiralling

Quebec Precious Metals Corporation Ne New le lead ading gol old explo lorer, Ja James Bay y

of S&amp;T activities Nanosciences and Nanotechnologies databases Lionel Villard, Michel Revollo

3-Coloring the Discrete Torus or Rigidity of zero temperature 3-states anti-ferromagnetic Potts

Independent Mental DATE: Capacity Advocacy 25/02/2020 DNACPR INFORMATION - for Health Care

Transforming Local Policing OFFICIAL SENSITIVE Sussex Local Policing

Economic Analysis for Determination of Excessive Share Limits for the Northeast Multispecies

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

MKS Integrity & CMMI July, 2007 Why the drive for CMMI? Missed commitments Spiralling

of S&T activities Nanosciences and Nanotechnologies databases Lionel Villard, Michel Revollo