Myia: A Differentiable Language for Deep Learning Olivier Breuleux - - PowerPoint PPT Presentation

myia a differentiable language for deep learning
SMART_READER_LITE
LIVE PREVIEW

Myia: A Differentiable Language for Deep Learning Olivier Breuleux - - PowerPoint PPT Presentation

Institut des algorithmes dapprentissage de Montral Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart van Merrinboer (MILA, Google) Arnaud Bergeron (MILA) Paradigm How deep learning and


slide-1
SLIDE 1

Institut des algorithmes d’apprentissage de Montréal

Myia: A Differentiable Language for Deep Learning

Olivier Breuleux

Computer Analyst, MILA

Bart van Merriënboer (MILA, Google) Arnaud Bergeron (MILA)

slide-2
SLIDE 2

Paradigm

How deep learning and language design intersect

Autodiff

What it is. How it works

Myia

Our proposed solution

slide-3
SLIDE 3

Paradigm

How deep learning and language design intersect

Autodiff

What it is. How it works

Myia

Our proposed solution

slide-4
SLIDE 4

Deep Learning

(Elgammal & al., 2017)

slide-5
SLIDE 5

Deep Learning

Features + Composition + Learning

?

?

?

? ? ?

? ?

? Data Trained structure Gradient descent Initial structure (automatable)

slide-6
SLIDE 6

Needs

General purpose: Express complex compositions using control flow. Fast: Leverage parallelism and GPU to process millions of features. Portable: Serializable, support multiple hardware. Goal: a language adapted to the needs of machine learning, past and future Differentiable: Language support for gradient descent.

slide-7
SLIDE 7

General purpose

DL algorithms are increasingly complex Feedforward (trivial) Recurrent (loops) Recursive (recursion)

…?

slide-8
SLIDE 8

General purpose

DL algorithms are increasingly complex

  • More and more language features needed
  • Most existing frameworks are limited
  • Awkward abstractions
  • No recursion
  • High level abstraction increases productivity
  • Focus on the algorithm over implementation details
  • Effortless abstractions encourage their use
slide-9
SLIDE 9

Fast

  • Scale to millions of parameters
  • Lots of parallel operations
  • Matrix multiplication, map, reduce
  • Can work with low precision (float32, 16, even 8 bits)
  • Leverage adapted hardware (GPU, TPU)
  • Loop fusion
  • Automatic parallelization
  • Memory management
slide-10
SLIDE 10

Portable

  • Serializable models
  • Code + parameters (data)
  • Run on mobile and embedded systems
  • Seamless transfer from research to

production

  • Avoid being tied to an ecosystem (e.g.

Python/numpy)

  • e.g. ONNX (but more general)
slide-11
SLIDE 11

Differentiable

slide-12
SLIDE 12

Paradigm

How deep learning and language design intersect

Autodiff

What it is. How it works

Myia

Our proposed solution

slide-13
SLIDE 13

Derivative

d dxf(x) = d f dx = f 0(x) = lim

✏!0

f(x + ✏) − f(x) ✏

<latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit><latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit><latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit>
slide-14
SLIDE 14

Derivative

d dxf(x) = d f dx = f 0(x) = lim

✏!0

f(x + ✏) − f(x) ✏

<latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit><latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit><latexit sha1_base64="rwitAWd0MVf2KHbXqGku4nti/I=">ACWHicbVFbS8MwGE3rZRdvcz76EhyiIo5OBPVBGPji4xTrhLWMNE23sDQpSaobpX/SB0H8Kz6YbV0+kHg5JzvfElOgoRpR3nzbKXldWS+VKdW19Y3Ortl1/UCKVmLhYMCEfA6QIo5y4mpGHhNJUBw0g1G1O9+0SkoLf60lC/BgNOI0oRtpQ/VriRLhLMyzcJxHh+MjeAULKpxZh8dFDyjcT/zSKIoEx6kg6GkpnqGTFy4zAh7Dr54jeAKnQ/NvV96vNZymMyv4F7QK0ABFdfq1Fy8UOI0J15ghpXotJ9F+hqSmJG86qWKJAiP0ID0DOQoJsrPZsnkcN8wIYyENItrOGN/OjIUKzWJA9MZIz1Ui9qU/E/rpTq68DPKk1QTjucHRSmDWsBpzDCkmDNJgYgLKm5K8RDZBLS5jOqJoTW4pP/Ave0edl0bs8a7bsijTLYBXvgELTAOWiDG9ABLsDgFXxYK9aq9W5bdsmuzFtq/DsgF9l1z8BwuCy8A=</latexit>

Chain rule

d dxg(f(x)) = dg d f d f dx d dxh(f(x), g(x)) = ∂h ∂f d f dx + ∂h ∂g d f dx

Total derivative

slide-15
SLIDE 15

In many dimensions

Jf(x) =   

∂f1 ∂x1

. . .

∂f1 ∂xm

. . . ... . . .

∂fn ∂x1

. . .

∂fn ∂xm

  

<latexit sha1_base64="pG5Ync68BJWnpQgFK+WLwx9Ttc=">AC3iclVJNa9tAEF2paeu6H1GbYy9LTIt7MVIpDkEDL2EnNISJwFLiN3VyF68WondVbAROvXSQxtyzd/KLb+k1678QRK7FDog9Pa9mXnaGdFCcG18/9ZxH209fvK09az9/MXLV9ve6zenOi8VgwHLRa7OKdEguISB4UbAeaGAZFTAGZ18afSzC1Ca5/LEzAqIMjKSPOWMGEvF3u8wI2ZM0+oTrvTDzU+wCGFEZcVtYLi07qNbYSpIqwKC6IMJwKncVDfnab2hN/jMmNbt7/zs1qHIaLpherimQFsxKX28k/8NUrpm2Q5DJ3aVir+P3/HngTRAsQct4zj2bqwVKzOQhgmi9TDwCxNVjQMTYNuXGgrCJmQEQwslyUBH1XxBNX5nmQSnubKPNHjO3q+oSKb1LKM2s1mHXtca8m/asDTp56jisigNSLYwSkuBTY6beOEK2BGzCwgTH7rZiNiR2Vsf9EM4Rg/cqbYPCxt9/zv37q9L8tp9FCb9Eu6qIA7aE+OkTHaICYEznfnZ/OL5e6P9xL92qR6jrLmh30INzrP9Vc5IA=</latexit><latexit sha1_base64="pG5Ync68BJWnpQgFK+WLwx9Ttc=">AC3iclVJNa9tAEF2paeu6H1GbYy9LTIt7MVIpDkEDL2EnNISJwFLiN3VyF68WondVbAROvXSQxtyzd/KLb+k1678QRK7FDog9Pa9mXnaGdFCcG18/9ZxH209fvK09az9/MXLV9ve6zenOi8VgwHLRa7OKdEguISB4UbAeaGAZFTAGZ18afSzC1Ca5/LEzAqIMjKSPOWMGEvF3u8wI2ZM0+oTrvTDzU+wCGFEZcVtYLi07qNbYSpIqwKC6IMJwKncVDfnab2hN/jMmNbt7/zs1qHIaLpherimQFsxKX28k/8NUrpm2Q5DJ3aVir+P3/HngTRAsQct4zj2bqwVKzOQhgmi9TDwCxNVjQMTYNuXGgrCJmQEQwslyUBH1XxBNX5nmQSnubKPNHjO3q+oSKb1LKM2s1mHXtca8m/asDTp56jisigNSLYwSkuBTY6beOEK2BGzCwgTH7rZiNiR2Vsf9EM4Rg/cqbYPCxt9/zv37q9L8tp9FCb9Eu6qIA7aE+OkTHaICYEznfnZ/OL5e6P9xL92qR6jrLmh30INzrP9Vc5IA=</latexit><latexit sha1_base64="pG5Ync68BJWnpQgFK+WLwx9Ttc=">AC3iclVJNa9tAEF2paeu6H1GbYy9LTIt7MVIpDkEDL2EnNISJwFLiN3VyF68WondVbAROvXSQxtyzd/KLb+k1678QRK7FDog9Pa9mXnaGdFCcG18/9ZxH209fvK09az9/MXLV9ve6zenOi8VgwHLRa7OKdEguISB4UbAeaGAZFTAGZ18afSzC1Ca5/LEzAqIMjKSPOWMGEvF3u8wI2ZM0+oTrvTDzU+wCGFEZcVtYLi07qNbYSpIqwKC6IMJwKncVDfnab2hN/jMmNbt7/zs1qHIaLpherimQFsxKX28k/8NUrpm2Q5DJ3aVir+P3/HngTRAsQct4zj2bqwVKzOQhgmi9TDwCxNVjQMTYNuXGgrCJmQEQwslyUBH1XxBNX5nmQSnubKPNHjO3q+oSKb1LKM2s1mHXtca8m/asDTp56jisigNSLYwSkuBTY6beOEK2BGzCwgTH7rZiNiR2Vsf9EM4Rg/cqbYPCxt9/zv37q9L8tp9FCb9Eu6qIA7aE+OkTHaICYEznfnZ/OL5e6P9xL92qR6jrLmh30INzrP9Vc5IA=</latexit>

Suppose you have a vector of inputs and a vector of outputs.

f : Rm → Rn

<latexit sha1_base64="dbS8e4vtU/0qzUutMvsNrTs+70U=">ACEXicbVDLSgMxFM3UV62vqks3wSLopkxF8LEquHFZi2MLnaFk0kwbmseQZJQy9Bvc+CtuXKi4defOvzHTzqK2HgczjmX3HvCmFtXPfHKSwtr6yuFdLG5tb2zvl3b17LROFiYclk6odIk0YFcQz1DSjhVBPGSkFQ6vM7/1QJSmUtyZUwCjvqCRhQjY6Vu+SCV9DnyAzCMG2Ouxz6ivYHBiklH2cNm624VXcCuEhqOamAHI1u+dvSZxwIgxmSOtOzY1NkCJlKGZkXPITWKEh6hPOpYKxIkO0slJY3hklR6MpLJPGDhRZydSxLUe8dAmsx31vJeJ/3mdxEQXQUpFnBgi8PSjKGHQSJj1A3tUEWzYyBKEFbW7QjxACmFjWyzZEmrzJy8S7R6WXVvzyr1Zt5GERyAQ3AMauAc1MENaAPYPAEXsAbeHenVfnw/mcRgtOPrMP/sD5+gWJxp2u</latexit><latexit sha1_base64="dbS8e4vtU/0qzUutMvsNrTs+70U=">ACEXicbVDLSgMxFM3UV62vqks3wSLopkxF8LEquHFZi2MLnaFk0kwbmseQZJQy9Bvc+CtuXKi4defOvzHTzqK2HgczjmX3HvCmFtXPfHKSwtr6yuFdLG5tb2zvl3b17LROFiYclk6odIk0YFcQz1DSjhVBPGSkFQ6vM7/1QJSmUtyZUwCjvqCRhQjY6Vu+SCV9DnyAzCMG2Ouxz6ivYHBiklH2cNm624VXcCuEhqOamAHI1u+dvSZxwIgxmSOtOzY1NkCJlKGZkXPITWKEh6hPOpYKxIkO0slJY3hklR6MpLJPGDhRZydSxLUe8dAmsx31vJeJ/3mdxEQXQUpFnBgi8PSjKGHQSJj1A3tUEWzYyBKEFbW7QjxACmFjWyzZEmrzJy8S7R6WXVvzyr1Zt5GERyAQ3AMauAc1MENaAPYPAEXsAbeHenVfnw/mcRgtOPrMP/sD5+gWJxp2u</latexit><latexit sha1_base64="dbS8e4vtU/0qzUutMvsNrTs+70U=">ACEXicbVDLSgMxFM3UV62vqks3wSLopkxF8LEquHFZi2MLnaFk0kwbmseQZJQy9Bvc+CtuXKi4defOvzHTzqK2HgczjmX3HvCmFtXPfHKSwtr6yuFdLG5tb2zvl3b17LROFiYclk6odIk0YFcQz1DSjhVBPGSkFQ6vM7/1QJSmUtyZUwCjvqCRhQjY6Vu+SCV9DnyAzCMG2Ouxz6ivYHBiklH2cNm624VXcCuEhqOamAHI1u+dvSZxwIgxmSOtOzY1NkCJlKGZkXPITWKEh6hPOpYKxIkO0slJY3hklR6MpLJPGDhRZydSxLUe8dAmsx31vJeJ/3mdxEQXQUpFnBgi8PSjKGHQSJj1A3tUEWzYyBKEFbW7QjxACmFjWyzZEmrzJy8S7R6WXVvzyr1Zt5GERyAQ3AMauAc1MENaAPYPAEXsAbeHenVfnw/mcRgtOPrMP/sD5+gWJxp2u</latexit>

The Jacobian is the matrix of partial derivatives.

slide-16
SLIDE 16

Chain Rule

y1 = f(x) y2 = g(y1) y3 = h(y2)

<latexit sha1_base64="FR8JrMfD1GtSrL9l9jua0DwEKV4=">ACFXicbZBNS8MwGMfT+TbrW9Wjl+BQtoOjnYJ6EAZePE6xbrCOkmbpFpa+kKRiKfsUXvwqXjyoeBW8+W1Mtx508w+Bf37P85A8fy9mVEjT/NZKC4tLyvlVX1tfWNzy9jeuRNRwjGxcQi3vGQIyGxJZUMtKJOUGBx0jbG13m9fY94YJG4a1MY9IL0CkPsVIKuQaR6lrwcML6FcfatBx9NRt5NdBVfECHOdgqECj5hoVs25OBOeNVZgKNRyjS+nH+EkIKHEDAnRtcxY9jLEJcWMjHUnESRGeIQGpKtsiAIietlkrTE8UKQP/YirE0o4ob8nMhQIkQae6gyQHIrZWg7/q3UT6Z/1MhrGiSQhnj7kJwzKCOYZwT7lBEuWKoMwp+qvEA8R1iqJHUVgjW78ryxG/Xzunl9UmneFGmUwR7YB1VgVPQBFegBWyAwSN4Bq/gTXvSXrR37WPaWtKmV3wR9rnD9xrmc4=</latexit><latexit sha1_base64="FR8JrMfD1GtSrL9l9jua0DwEKV4=">ACFXicbZBNS8MwGMfT+TbrW9Wjl+BQtoOjnYJ6EAZePE6xbrCOkmbpFpa+kKRiKfsUXvwqXjyoeBW8+W1Mtx508w+Bf37P85A8fy9mVEjT/NZKC4tLyvlVX1tfWNzy9jeuRNRwjGxcQi3vGQIyGxJZUMtKJOUGBx0jbG13m9fY94YJG4a1MY9IL0CkPsVIKuQaR6lrwcML6FcfatBx9NRt5NdBVfECHOdgqECj5hoVs25OBOeNVZgKNRyjS+nH+EkIKHEDAnRtcxY9jLEJcWMjHUnESRGeIQGpKtsiAIietlkrTE8UKQP/YirE0o4ob8nMhQIkQae6gyQHIrZWg7/q3UT6Z/1MhrGiSQhnj7kJwzKCOYZwT7lBEuWKoMwp+qvEA8R1iqJHUVgjW78ryxG/Xzunl9UmneFGmUwR7YB1VgVPQBFegBWyAwSN4Bq/gTXvSXrR37WPaWtKmV3wR9rnD9xrmc4=</latexit><latexit sha1_base64="FR8JrMfD1GtSrL9l9jua0DwEKV4=">ACFXicbZBNS8MwGMfT+TbrW9Wjl+BQtoOjnYJ6EAZePE6xbrCOkmbpFpa+kKRiKfsUXvwqXjyoeBW8+W1Mtx508w+Bf37P85A8fy9mVEjT/NZKC4tLyvlVX1tfWNzy9jeuRNRwjGxcQi3vGQIyGxJZUMtKJOUGBx0jbG13m9fY94YJG4a1MY9IL0CkPsVIKuQaR6lrwcML6FcfatBx9NRt5NdBVfECHOdgqECj5hoVs25OBOeNVZgKNRyjS+nH+EkIKHEDAnRtcxY9jLEJcWMjHUnESRGeIQGpKtsiAIietlkrTE8UKQP/YirE0o4ob8nMhQIkQae6gyQHIrZWg7/q3UT6Z/1MhrGiSQhnj7kJwzKCOYZwT7lBEuWKoMwp+qvEA8R1iqJHUVgjW78ryxG/Xzunl9UmneFGmUwR7YB1VgVPQBFegBWyAwSN4Bq/gTXvSXrR37WPaWtKmV3wR9rnD9xrmc4=</latexit>

f : Rm → Rp g : Rp → Rq h : Rq → Rn

<latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="C39OhB+IczRcjLNINXH29e9lt8M=">AB2HicbZDNSgMxFIXv1L86Vq1rN8EiuCpTN+pOcOygmML7VAymTtaCYzJHeEMvQFXLhRfDB3vo3pz0KtBwIf5yTk3hMXSloKgi+vtrW9s7tX3/cPGv7h0XGz8WTz0gMRa5y04+5RSU1hiRJYb8wyLNYS+e3i3y3jMaK3P9SLMCo4yPtUyl4OSs7qjZCtrBUmwTOmtowVqj5ucwyUWZoSahuLWDTlBQVHFDUic+8PSYsHFlI9x4FDzDG1ULcecs3PnJCzNjTua2NL9+aLimbWzLHY3M04T+zdbmP9lg5LS6iSuigJtVh9lJaKUc4WO7NEGhSkZg64MNLNysSEGy7INeO7Djp/N96E8LJ90w4eAqjDKZzBXTgCm7hHroQgoAEXuDNm3iv3vuqpq37uwEfsn7+Aap5IoM</latexit><latexit sha1_base64="KTxKCtjSTZwxigvlCFitzU1Zg10=">ACY3icdVG9TsMwGHTCXwkFSheGgrCoQExVwsLPhMTCWBChlZqoclwntXCc1HZAVdSVB2TjHVh4A5y2Q0npJ1k63Z31nc9ByqhUtv1lmGvrG5tblW1rp7q7t187qL7IJBOYuDhiegGSBJGOXEVYx0U0FQHDSCV7vC73zRoSkCX9W45T4MYo4DSlGSlP92kcIz2+hFyM1DIL8adKPoSdoNFRIiOR9Uih51lRyZ2uco8K97DkHq1y6yBNu2VPBy4DZw6aYD7tfu3TGyQ4iwlXmCEpe46dKj9HQlHMyMTyMklShF9RHoachQT6efTvibwTDMDGCZCH67glF28kaNYynEcaGeRUZa1gvxP62UqvPZzytNMEY5ni8KMQZXAonw4oIJgxcYaICyozgrxEAmElf4iS5fglJ+8DNzL1k3LfrRBTAKbgADrgCd+ABtIELMPg26kbDODJ+zEPzeNaWacxrq4M/Y578At0ur4=</latexit><latexit sha1_base64="KTxKCtjSTZwxigvlCFitzU1Zg10=">ACY3icdVG9TsMwGHTCXwkFSheGgrCoQExVwsLPhMTCWBChlZqoclwntXCc1HZAVdSVB2TjHVh4A5y2Q0npJ1k63Z31nc9ByqhUtv1lmGvrG5tblW1rp7q7t187qL7IJBOYuDhiegGSBJGOXEVYx0U0FQHDSCV7vC73zRoSkCX9W45T4MYo4DSlGSlP92kcIz2+hFyM1DIL8adKPoSdoNFRIiOR9Uih51lRyZ2uco8K97DkHq1y6yBNu2VPBy4DZw6aYD7tfu3TGyQ4iwlXmCEpe46dKj9HQlHMyMTyMklShF9RHoachQT6efTvibwTDMDGCZCH67glF28kaNYynEcaGeRUZa1gvxP62UqvPZzytNMEY5ni8KMQZXAonw4oIJgxcYaICyozgrxEAmElf4iS5fglJ+8DNzL1k3LfrRBTAKbgADrgCd+ABtIELMPg26kbDODJ+zEPzeNaWacxrq4M/Y578At0ur4=</latexit><latexit sha1_base64="EkpZKbT9mWbEBvTrFrQGFQwFolA=">ACbnicdVE7T8MwGHTCq4RXKQNDQVhUoE5VwsJjqsTCWCpKzVR5bhOatV51HZAVdSVH8jGf2DhH+C0GUpKP8nS6e4+Xx2Y0aFNM0vTd/Y3NreKe0ae/sHh0fl48qriBKOSQdHLOI9FwnCaEg6kpGejEnKHAZ6brjx0zvhEuaBS+yGlMnAD5IfUoRlJRg/KHB68foB0gOXLdtD0bBNDm1B9JxHn0vizE0LYNv+CO17knmXtUcE/WuVWQmtkw5wNXgZWDGsinNSh/2sMIJwEJWZIiL5lxtJEZcUMzIz7ESQGOEx8klfwRAFRDjpvK8ZvFLMEHoRVyeUcM4ub6QoEGIauMqZRFLSP/0/qJ9O6clIZxIkmIFxd5CYMygln5cEg5wZJNFUCYU5UV4hHiCEv1RYqwSo+eRV0bhr3DfPZrDXbeRslUAWXoA4scAua4Am0QAdg8K1VtKp2pv3op/q5frGw6lq+cwL+jF7/Bf/Ru6Q=</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit><latexit sha1_base64="dO5pQ9/rPJkhK3xmvzvEsxqnp+Q=">ACbnicdVHNT8IwHO3mF84vxIMHNDYSjScyjIkfJxIvHpE4IWEL6UrHGtptJ2GLFz9A735P3jxP7ADgjyS5q8vPd+6eurnzAqlW1/Geba+sbmVmHb2tnd2z8oHpZeZwKTBwcs1i0fSQJoxFxFWMtBNBEPcZafmDx1xvREhaRy9qFCPI76EQ0oRkpT3eJHAC8foMuRCn0/a467HLqC9kOFhIjf54UEuq7VX3Anq9zD3B0uIer3DpIxa7ak4HLoDYDFTCbRrf46fZinHISKcyQlJ2anSgvQ0JRzMjYclNJEoQHqE86GkaIE+lk7G8EIzPRjEQp9IwQk7v5EhLuWI+9qZ5SLWk7+p3VSFdx5GY2SVJEITy8KUgZVDPyY8KghUbaYCwoDorxCESCv9RZYuob45GXgXFfvq/bzTaXenLVRAGVwDq5ADdyCOngCDeADL6NklE2Towf89g8Nc+mVtOY7RyBP2Ne/QIBILuo</latexit>

Jhgf(x) | {z }

n⇥m

= Jh(y2) | {z }

n⇥q

Jg(y1) | {z }

q⇥p

Jf(x) | {z }

p⇥m

<latexit sha1_base64="bA4x/LBIAvRyVfls/ao3G8bvaiM=">ADSXicpVLNa9RAFJ9k/ajr17YevQwuwvayJKWgHoSCF+mpimsLmyVOJi+boZNJOvMiXYb8fV56s0/wosHFU9O0oj2Qyj4YHg/3sfv/d5jkoKg0Hw2fMHN27eur12Z3j3v0HD0frG+9NWsOM17KUh8kzIAUCmYoUMJBpYEViYT95PBVm9/CNqIUr3DVQWLgi2VyARn6ELxuvchqlUKOtGMg40KhnmS2d3Y5jTiQnO67H3WTI43mya2ikYoCjC0aOjLYVTlTGFZ2CgpZWpWhXM2QjGTpvVkDaTphlePSWfrOKt86xHzX9yLh1n2HEe/easrse5+U/OrN+9+rP7tRnj0TiYBp3RyDswZj0thePTqO05HUBCrlkxszDoMKFZRoFl9BKNFAxfsiWMHdQMSdoYbuhDX3qIinNSu2eQtpF/+6wrDCtVlfZbmcu5trgVbl5jdnzhRWqhEUPxuU1ZJiSdt/RVOhgaNcOcC4Fk4r5Tlzd0T3+9ojhBdXvgxmW9MX0+DN9njnbX+NfKYPCETEpJnZIe8JntkRrj3yfvifO+yf+V/+H/Os1Pf6nkfknA0GvwDrPhlM</latexit><latexit sha1_base64="bA4x/LBIAvRyVfls/ao3G8bvaiM=">ADSXicpVLNa9RAFJ9k/ajr17YevQwuwvayJKWgHoSCF+mpimsLmyVOJi+boZNJOvMiXYb8fV56s0/wosHFU9O0oj2Qyj4YHg/3sfv/d5jkoKg0Hw2fMHN27eur12Z3j3v0HD0frG+9NWsOM17KUh8kzIAUCmYoUMJBpYEViYT95PBVm9/CNqIUr3DVQWLgi2VyARn6ELxuvchqlUKOtGMg40KhnmS2d3Y5jTiQnO67H3WTI43mya2ikYoCjC0aOjLYVTlTGFZ2CgpZWpWhXM2QjGTpvVkDaTphlePSWfrOKt86xHzX9yLh1n2HEe/easrse5+U/OrN+9+rP7tRnj0TiYBp3RyDswZj0thePTqO05HUBCrlkxszDoMKFZRoFl9BKNFAxfsiWMHdQMSdoYbuhDX3qIinNSu2eQtpF/+6wrDCtVlfZbmcu5trgVbl5jdnzhRWqhEUPxuU1ZJiSdt/RVOhgaNcOcC4Fk4r5Tlzd0T3+9ojhBdXvgxmW9MX0+DN9njnbX+NfKYPCETEpJnZIe8JntkRrj3yfvifO+yf+V/+H/Os1Pf6nkfknA0GvwDrPhlM</latexit><latexit sha1_base64="bA4x/LBIAvRyVfls/ao3G8bvaiM=">ADSXicpVLNa9RAFJ9k/ajr17YevQwuwvayJKWgHoSCF+mpimsLmyVOJi+boZNJOvMiXYb8fV56s0/wosHFU9O0oj2Qyj4YHg/3sfv/d5jkoKg0Hw2fMHN27eur12Z3j3v0HD0frG+9NWsOM17KUh8kzIAUCmYoUMJBpYEViYT95PBVm9/CNqIUr3DVQWLgi2VyARn6ELxuvchqlUKOtGMg40KhnmS2d3Y5jTiQnO67H3WTI43mya2ikYoCjC0aOjLYVTlTGFZ2CgpZWpWhXM2QjGTpvVkDaTphlePSWfrOKt86xHzX9yLh1n2HEe/easrse5+U/OrN+9+rP7tRnj0TiYBp3RyDswZj0thePTqO05HUBCrlkxszDoMKFZRoFl9BKNFAxfsiWMHdQMSdoYbuhDX3qIinNSu2eQtpF/+6wrDCtVlfZbmcu5trgVbl5jdnzhRWqhEUPxuU1ZJiSdt/RVOhgaNcOcC4Fk4r5Tlzd0T3+9ojhBdXvgxmW9MX0+DN9njnbX+NfKYPCETEpJnZIe8JntkRrj3yfvifO+yf+V/+H/Os1Pf6nkfknA0GvwDrPhlM</latexit>

The derivative of a straight composition of functions is the product of their Jacobians

In what order?

slide-17
SLIDE 17

Ordering the Jacobians

( Jh(y2) | {z }

n×q

Jg(y1) | {z }

q×p

| {z }

n×p

) Jf(x) | {z }

p×m

<latexit sha1_base64="JaQOtQ2WAS20Br+08aKQTmorcBM=">ADGXiclZJLb9QwEMed8Crh0S0cuViskLaXVIhUW6VuCBOBbG0mYVOc5k16rtpPYEdWXlc3DpV+mFAyCOcOLb4KRb0QcgGMny3zOe34xHzmspLMbxjyC8dv3GzVtrt6M7d+/dXx9sPHhnq8ZwmPBKVmY/Zxak0DBgRL2awNM5RL28oMXzvPRgrKv0WlzXMFJtrUQrO0LuyjSCOXJpXsrBL5TeXIhxhj3UGinbUtlHa6AJMbhgHd+GQKoaLvHSvsVomW1tm3mNE1RKLD0CfWC6axUv9T4Bdz7plJzw8Y9ZtdL6EP/4NvflHdDk6sH1GUn9W7MdMcoGw3gc90avimQlhmRlu9ngW1pUvFGgkUtm7TSJa5w5ZlBwCV2PFmrGD9gcpl5q5juaub5qS594T0HLyvilkfbe8xmOKds16292z7OXY53zd7Fpg+X2zAldNwianxYqG0mxot0/oYUwFEuvWDcCN8r5QvmB4n+N3VDSC4/+aqYbI2fj+PXT4c7b1bTWCOPyGMyIgl5RnbIS7JLJoQH4KT4FPwOTwOP4Zfwq+nV8NglfOQXLDw+089SAWq</latexit><latexit sha1_base64="JaQOtQ2WAS20Br+08aKQTmorcBM=">ADGXiclZJLb9QwEMed8Crh0S0cuViskLaXVIhUW6VuCBOBbG0mYVOc5k16rtpPYEdWXlc3DpV+mFAyCOcOLb4KRb0QcgGMny3zOe34xHzmspLMbxjyC8dv3GzVtrt6M7d+/dXx9sPHhnq8ZwmPBKVmY/Zxak0DBgRL2awNM5RL28oMXzvPRgrKv0WlzXMFJtrUQrO0LuyjSCOXJpXsrBL5TeXIhxhj3UGinbUtlHa6AJMbhgHd+GQKoaLvHSvsVomW1tm3mNE1RKLD0CfWC6axUv9T4Bdz7plJzw8Y9ZtdL6EP/4NvflHdDk6sH1GUn9W7MdMcoGw3gc90avimQlhmRlu9ngW1pUvFGgkUtm7TSJa5w5ZlBwCV2PFmrGD9gcpl5q5juaub5qS594T0HLyvilkfbe8xmOKds16292z7OXY53zd7Fpg+X2zAldNwianxYqG0mxot0/oYUwFEuvWDcCN8r5QvmB4n+N3VDSC4/+aqYbI2fj+PXT4c7b1bTWCOPyGMyIgl5RnbIS7JLJoQH4KT4FPwOTwOP4Zfwq+nV8NglfOQXLDw+089SAWq</latexit><latexit sha1_base64="JaQOtQ2WAS20Br+08aKQTmorcBM=">ADGXiclZJLb9QwEMed8Crh0S0cuViskLaXVIhUW6VuCBOBbG0mYVOc5k16rtpPYEdWXlc3DpV+mFAyCOcOLb4KRb0QcgGMny3zOe34xHzmspLMbxjyC8dv3GzVtrt6M7d+/dXx9sPHhnq8ZwmPBKVmY/Zxak0DBgRL2awNM5RL28oMXzvPRgrKv0WlzXMFJtrUQrO0LuyjSCOXJpXsrBL5TeXIhxhj3UGinbUtlHa6AJMbhgHd+GQKoaLvHSvsVomW1tm3mNE1RKLD0CfWC6axUv9T4Bdz7plJzw8Y9ZtdL6EP/4NvflHdDk6sH1GUn9W7MdMcoGw3gc90avimQlhmRlu9ngW1pUvFGgkUtm7TSJa5w5ZlBwCV2PFmrGD9gcpl5q5juaub5qS594T0HLyvilkfbe8xmOKds16292z7OXY53zd7Fpg+X2zAldNwianxYqG0mxot0/oYUwFEuvWDcCN8r5QvmB4n+N3VDSC4/+aqYbI2fj+PXT4c7b1bTWCOPyGMyIgl5RnbIS7JLJoQH4KT4FPwOTwOP4Zfwq+nV8NglfOQXLDw+089SAWq</latexit>

Jh(y2) | {z }

n×q

( Jg(y1) | {z }

q×p

Jf(x) | {z }

p×m

| {z }

q×m

)

<latexit sha1_base64="JFRosXlhA1ZCM9t94Snf5Ck5Vc=">ADGHiclZJNb9QwEIad8FXC17YcuViskLaXJamQWm6VuFScCmJpc0qcpzJrlXbSe0J6srK3+DCX+HCARDX3vg3OlWKi1FMFKUVzOeZ16PnNdSWIzjn0F4+at23fW7kb37j94+GiwvHeVo3hMOGVrMxhzixIoWGCAiUc1gaYyiUc5EevuvrBzBWVPodLmuYKTbXohScoU9l68HztF4wjZVyaV7Jwi6V/7kU4QR7ujNQtKO2jdJGF2Bywzi4VDFc5KV7nS1Gy2xrs20zp2mKQoGlx230P6xrwHMPTnrw8Tm49o3/YHbzWrPl6KQn1udE1UYXB6i/O+/AUTYxuO4D3pVJCsxJKvYzwanaVHxRoFGLpm10ySuceaYQcEldFYt1IwfsTlMvdTMO5m5fmpLn/lMQcvK+E8j7bMXOxTtjPrT3a3tJdrXfJPtWmD5c7MCV03CJqfDSobSbGi3TOhTDAUS69YNwI75XyBfP7RP+YuiUkl698VUy2xi/H8ZsXw923q2skSfkKRmRhGyTXbJH9smE8OBj8Dn4GnwLP4Vfwu/hj7OjYbDqeUx+i/D0F0lCBZY=</latexit><latexit sha1_base64="JFRosXlhA1ZCM9t94Snf5Ck5Vc=">ADGHiclZJNb9QwEIad8FXC17YcuViskLaXJamQWm6VuFScCmJpc0qcpzJrlXbSe0J6srK3+DCX+HCARDX3vg3OlWKi1FMFKUVzOeZ16PnNdSWIzjn0F4+at23fW7kb37j94+GiwvHeVo3hMOGVrMxhzixIoWGCAiUc1gaYyiUc5EevuvrBzBWVPodLmuYKTbXohScoU9l68HztF4wjZVyaV7Jwi6V/7kU4QR7ujNQtKO2jdJGF2Bywzi4VDFc5KV7nS1Gy2xrs20zp2mKQoGlx230P6xrwHMPTnrw8Tm49o3/YHbzWrPl6KQn1udE1UYXB6i/O+/AUTYxuO4D3pVJCsxJKvYzwanaVHxRoFGLpm10ySuceaYQcEldFYt1IwfsTlMvdTMO5m5fmpLn/lMQcvK+E8j7bMXOxTtjPrT3a3tJdrXfJPtWmD5c7MCV03CJqfDSobSbGi3TOhTDAUS69YNwI75XyBfP7RP+YuiUkl698VUy2xi/H8ZsXw923q2skSfkKRmRhGyTXbJH9smE8OBj8Dn4GnwLP4Vfwu/hj7OjYbDqeUx+i/D0F0lCBZY=</latexit><latexit sha1_base64="JFRosXlhA1ZCM9t94Snf5Ck5Vc=">ADGHiclZJNb9QwEIad8FXC17YcuViskLaXJamQWm6VuFScCmJpc0qcpzJrlXbSe0J6srK3+DCX+HCARDX3vg3OlWKi1FMFKUVzOeZ16PnNdSWIzjn0F4+at23fW7kb37j94+GiwvHeVo3hMOGVrMxhzixIoWGCAiUc1gaYyiUc5EevuvrBzBWVPodLmuYKTbXohScoU9l68HztF4wjZVyaV7Jwi6V/7kU4QR7ujNQtKO2jdJGF2Bywzi4VDFc5KV7nS1Gy2xrs20zp2mKQoGlx230P6xrwHMPTnrw8Tm49o3/YHbzWrPl6KQn1udE1UYXB6i/O+/AUTYxuO4D3pVJCsxJKvYzwanaVHxRoFGLpm10ySuceaYQcEldFYt1IwfsTlMvdTMO5m5fmpLn/lMQcvK+E8j7bMXOxTtjPrT3a3tJdrXfJPtWmD5c7MCV03CJqfDSobSbGi3TOhTDAUS69YNwI75XyBfP7RP+YuiUkl698VUy2xi/H8ZsXw923q2skSfkKRmRhGyTXbJH9smE8OBj8Dn4GnwLP4Vfwu/hj7OjYbDqeUx+i/D0F0lCBZY=</latexit>

Forward Reverse Cost Cost

qpm + nqm

<latexit sha1_base64="PsrxQgC2l/owSl1oC8PH6vetvM=">AB73icbVBNSwMxEJ2tX7V+VT16CRZBEMquFNRbwYvHKq6tEvJptk2NMluk6xQlv4KLx5UvPp3vPlvTNs9aOuDgcd7M8zMCxPOtHdb6ewsrq2vlHcLG1t7+zulfcPHnScKkJ9EvNYtUKsKWeS+oYZTluJoliEnDbD4fXUbz5RpVks7804oYHAfckiRrCx0uMoEegMyZHolitu1Z0BLRMvJxXI0eiWvzq9mKSCSkM41rtuYkJMqwMI5xOSp1U0wSTIe7TtqUSC6qDbHbwBJ1YpYeiWNmSBs3U3xMZFlqPRWg7BTYDvehNxf+8dmqiyBjMkNlWS+KEo5MjGafo96TFi+NgSTBSztyIywAoTYzMq2RC8xZeXiX9evaq6t7VK/S5PowhHcAyn4MEF1OEGuADAQHP8ApvjnJenHfnY95acPKZQ/gD5/MHZ/6PvA=</latexit><latexit sha1_base64="PsrxQgC2l/owSl1oC8PH6vetvM=">AB73icbVBNSwMxEJ2tX7V+VT16CRZBEMquFNRbwYvHKq6tEvJptk2NMluk6xQlv4KLx5UvPp3vPlvTNs9aOuDgcd7M8zMCxPOtHdb6ewsrq2vlHcLG1t7+zulfcPHnScKkJ9EvNYtUKsKWeS+oYZTluJoliEnDbD4fXUbz5RpVks7804oYHAfckiRrCx0uMoEegMyZHolitu1Z0BLRMvJxXI0eiWvzq9mKSCSkM41rtuYkJMqwMI5xOSp1U0wSTIe7TtqUSC6qDbHbwBJ1YpYeiWNmSBs3U3xMZFlqPRWg7BTYDvehNxf+8dmqiyBjMkNlWS+KEo5MjGafo96TFi+NgSTBSztyIywAoTYzMq2RC8xZeXiX9evaq6t7VK/S5PowhHcAyn4MEF1OEGuADAQHP8ApvjnJenHfnY95acPKZQ/gD5/MHZ/6PvA=</latexit><latexit sha1_base64="PsrxQgC2l/owSl1oC8PH6vetvM=">AB73icbVBNSwMxEJ2tX7V+VT16CRZBEMquFNRbwYvHKq6tEvJptk2NMluk6xQlv4KLx5UvPp3vPlvTNs9aOuDgcd7M8zMCxPOtHdb6ewsrq2vlHcLG1t7+zulfcPHnScKkJ9EvNYtUKsKWeS+oYZTluJoliEnDbD4fXUbz5RpVks7804oYHAfckiRrCx0uMoEegMyZHolitu1Z0BLRMvJxXI0eiWvzq9mKSCSkM41rtuYkJMqwMI5xOSp1U0wSTIe7TtqUSC6qDbHbwBJ1YpYeiWNmSBs3U3xMZFlqPRWg7BTYDvehNxf+8dmqiyBjMkNlWS+KEo5MjGafo96TFi+NgSTBSztyIywAoTYzMq2RC8xZeXiX9evaq6t7VK/S5PowhHcAyn4MEF1OEGuADAQHP8ApvjnJenHfnY95acPKZQ/gD5/MHZ/6PvA=</latexit>

nqp + npm

<latexit sha1_base64="kaAj2Joxew7rtxsfYL7H8PSMEWY=">AB73icbVBNSwMxEJ2tX7V+VT16CRZBEMpWBPVW8OKximsr7VKyabYNTbIxyQpl6a/w4kHFq3/Hm/GtN2Dtj4YeLw3w8y8SHFmrO9/e4Wl5ZXVteJ6aWNza3unvLt3b5JUExqQhCe6FWFDOZM0sMxy2lKaYhFx2oyGVxO/+US1Ym8syNFQ4H7ksWMYOukB/mo0AmSnTLFb/qT4EWS0nFcjR6Ja/Or2EpIJKSzg2pl3zlQ0zrC0jnI5LndRQhckQ92nbUYkFNWE2PXiMjpzSQ3GiXUmLpurviQwLY0Yicp0C24GZ9ybif147tfFmDGpUkslmS2KU45sgibfox7TlFg+cgQTzdytiAywxsS6jEouhNr8y4skOK1eVv2bs0r9Nk+jCAdwCMdQg3OowzU0IACAp7hFd487b14797HrLXg5TP78Afe5w9n/o+8</latexit><latexit sha1_base64="kaAj2Joxew7rtxsfYL7H8PSMEWY=">AB73icbVBNSwMxEJ2tX7V+VT16CRZBEMpWBPVW8OKximsr7VKyabYNTbIxyQpl6a/w4kHFq3/Hm/GtN2Dtj4YeLw3w8y8SHFmrO9/e4Wl5ZXVteJ6aWNza3unvLt3b5JUExqQhCe6FWFDOZM0sMxy2lKaYhFx2oyGVxO/+US1Ym8syNFQ4H7ksWMYOukB/mo0AmSnTLFb/qT4EWS0nFcjR6Ja/Or2EpIJKSzg2pl3zlQ0zrC0jnI5LndRQhckQ92nbUYkFNWE2PXiMjpzSQ3GiXUmLpurviQwLY0Yicp0C24GZ9ybif147tfFmDGpUkslmS2KU45sgibfox7TlFg+cgQTzdytiAywxsS6jEouhNr8y4skOK1eVv2bs0r9Nk+jCAdwCMdQg3OowzU0IACAp7hFd487b14797HrLXg5TP78Afe5w9n/o+8</latexit><latexit sha1_base64="kaAj2Joxew7rtxsfYL7H8PSMEWY=">AB73icbVBNSwMxEJ2tX7V+VT16CRZBEMpWBPVW8OKximsr7VKyabYNTbIxyQpl6a/w4kHFq3/Hm/GtN2Dtj4YeLw3w8y8SHFmrO9/e4Wl5ZXVteJ6aWNza3unvLt3b5JUExqQhCe6FWFDOZM0sMxy2lKaYhFx2oyGVxO/+US1Ym8syNFQ4H7ksWMYOukB/mo0AmSnTLFb/qT4EWS0nFcjR6Ja/Or2EpIJKSzg2pl3zlQ0zrC0jnI5LndRQhckQ92nbUYkFNWE2PXiMjpzSQ3GiXUmLpurviQwLY0Yicp0C24GZ9ybif147tfFmDGpUkslmS2KU45sgibfox7TlFg+cgQTzdytiAywxsS6jEouhNr8y4skOK1eVv2bs0r9Nk+jCAdwCMdQg3OowzU0IACAp7hFd487b14797HrLXg5TP78Afe5w9n/o+8</latexit>

= m(qp + nq)

<latexit sha1_base64="Fj7AFZDYpM3/JTcVuDvNC8XBago=">ACBnicbVBNSwMxEM3Wr1q/qh4FCRahIpRdEdSDUPDisYq1hbaUbDptQ7PZbTIrlqU3L/4VLx5UvPobvPlvTD8OWn0w8HhvJpN5fiSFQdf9clJz8wuLS+nlzMrq2vpGdnPr1oSx5lDmoQx1WcGpFBQRoESqpEGFvgSKn7vYuRX7kAbEaobHETQCFhHibgDK3UzO6e06SOcI/jpxJfxjAMhvl+RA+p6h80szm34I5B/xJvSnJkilIz+1lvhTwOQCGXzJia50bYSJhGwSUM/XYQMR4j3WgZqliAZhGMl4+pPtWadF2qG0pGP150TCAmMGgW87A4ZdM+uNxP+8Wozt0YiVBQjKD5Z1I4lxZCOQqEtoYGjHFjCuBb2r5R3mWYcbXQZG4I3e/JfUj4qnBXcq+Nc8XqaRprskD2SJx45IUVySUqkTDh5IE/khbw6j86z8+a8T1pTznRm/yC8/ENxgKY2A=</latexit><latexit sha1_base64="Fj7AFZDYpM3/JTcVuDvNC8XBago=">ACBnicbVBNSwMxEM3Wr1q/qh4FCRahIpRdEdSDUPDisYq1hbaUbDptQ7PZbTIrlqU3L/4VLx5UvPobvPlvTD8OWn0w8HhvJpN5fiSFQdf9clJz8wuLS+nlzMrq2vpGdnPr1oSx5lDmoQx1WcGpFBQRoESqpEGFvgSKn7vYuRX7kAbEaobHETQCFhHibgDK3UzO6e06SOcI/jpxJfxjAMhvl+RA+p6h80szm34I5B/xJvSnJkilIz+1lvhTwOQCGXzJia50bYSJhGwSUM/XYQMR4j3WgZqliAZhGMl4+pPtWadF2qG0pGP150TCAmMGgW87A4ZdM+uNxP+8Wozt0YiVBQjKD5Z1I4lxZCOQqEtoYGjHFjCuBb2r5R3mWYcbXQZG4I3e/JfUj4qnBXcq+Nc8XqaRprskD2SJx45IUVySUqkTDh5IE/khbw6j86z8+a8T1pTznRm/yC8/ENxgKY2A=</latexit><latexit sha1_base64="Fj7AFZDYpM3/JTcVuDvNC8XBago=">ACBnicbVBNSwMxEM3Wr1q/qh4FCRahIpRdEdSDUPDisYq1hbaUbDptQ7PZbTIrlqU3L/4VLx5UvPobvPlvTD8OWn0w8HhvJpN5fiSFQdf9clJz8wuLS+nlzMrq2vpGdnPr1oSx5lDmoQx1WcGpFBQRoESqpEGFvgSKn7vYuRX7kAbEaobHETQCFhHibgDK3UzO6e06SOcI/jpxJfxjAMhvl+RA+p6h80szm34I5B/xJvSnJkilIz+1lvhTwOQCGXzJia50bYSJhGwSUM/XYQMR4j3WgZqliAZhGMl4+pPtWadF2qG0pGP150TCAmMGgW87A4ZdM+uNxP+8Wozt0YiVBQjKD5Z1I4lxZCOQqEtoYGjHFjCuBb2r5R3mWYcbXQZG4I3e/JfUj4qnBXcq+Nc8XqaRprskD2SJx45IUVySUqkTDh5IE/khbw6j86z8+a8T1pTznRm/yC8/ENxgKY2A=</latexit>

= n(qp + pm)

slide-18
SLIDE 18

Forward vs Reverse

Forward mode is good when there are few inputs.

  • Easy to implement: dual numbers.

x → ✓ y1, dy1 dx ◆ → ✓ y2, dy2 dx ◆ → ✓ y3, dy3 dx ◆

<latexit sha1_base64="PFsqI4r5+MEFXWPxzrIbvPhf4=">ACnXichZFNSxBEIZ7JiazYdrPOTgpZMlYECWmVUx3gRBPIhoyGaFnWXp6amZbanZ+iuMS7N/AJ/oT/Duwd7Pw7rB6Sg4OWtp+nirbiUwmAQ3Hn+m5W371bX3jc+fPz0eb258eWvKSrNocsLWeirmBmQkEXBUq4KjWwPJbQi8fH03nvGrQRhfqDkxIGOcuUSAVn6Kxh8/aGRlpkI2RaF/9oJCHF7ckw3KFRqhm3idO1TW7qOfXzVbqzRHf+S+8u0bvL9LDZCtrBrOhLES5EiyzqYth8iJKCVzko5JIZ0w+DEgeWaRcQt2IKgMl42OWQd9JxXIwAzsLraY/nJPQtNCuFdKZu/zCstyYSR47Mmc4Ms9nU/O1Wb/C9NfAClVWCIrP0orSbGg0wvQRGjgKCdOMK6F25XyEXOBoLtTI0ogdbecrWMTpseZBlC1VlcWxfGTtDedx3UDRdX+Dycl6LbaR+2g8u91tHvRW5rZIt8J9skJAfkiJySC9IlnNx7Xz3qfOpf+Kf+edz1PcWbzbJk/J7jybizgI=</latexit><latexit sha1_base64="PFsqI4r5+MEFXWPxzrIbvPhf4=">ACnXichZFNSxBEIZ7JiazYdrPOTgpZMlYECWmVUx3gRBPIhoyGaFnWXp6amZbanZ+iuMS7N/AJ/oT/Duwd7Pw7rB6Sg4OWtp+nirbiUwmAQ3Hn+m5W371bX3jc+fPz0eb258eWvKSrNocsLWeirmBmQkEXBUq4KjWwPJbQi8fH03nvGrQRhfqDkxIGOcuUSAVn6Kxh8/aGRlpkI2RaF/9oJCHF7ckw3KFRqhm3idO1TW7qOfXzVbqzRHf+S+8u0bvL9LDZCtrBrOhLES5EiyzqYth8iJKCVzko5JIZ0w+DEgeWaRcQt2IKgMl42OWQd9JxXIwAzsLraY/nJPQtNCuFdKZu/zCstyYSR47Mmc4Ms9nU/O1Wb/C9NfAClVWCIrP0orSbGg0wvQRGjgKCdOMK6F25XyEXOBoLtTI0ogdbecrWMTpseZBlC1VlcWxfGTtDedx3UDRdX+Dycl6LbaR+2g8u91tHvRW5rZIt8J9skJAfkiJySC9IlnNx7Xz3qfOpf+Kf+edz1PcWbzbJk/J7jybizgI=</latexit><latexit sha1_base64="PFsqI4r5+MEFXWPxzrIbvPhf4=">ACnXichZFNSxBEIZ7JiazYdrPOTgpZMlYECWmVUx3gRBPIhoyGaFnWXp6amZbanZ+iuMS7N/AJ/oT/Duwd7Pw7rB6Sg4OWtp+nirbiUwmAQ3Hn+m5W371bX3jc+fPz0eb258eWvKSrNocsLWeirmBmQkEXBUq4KjWwPJbQi8fH03nvGrQRhfqDkxIGOcuUSAVn6Kxh8/aGRlpkI2RaF/9oJCHF7ckw3KFRqhm3idO1TW7qOfXzVbqzRHf+S+8u0bvL9LDZCtrBrOhLES5EiyzqYth8iJKCVzko5JIZ0w+DEgeWaRcQt2IKgMl42OWQd9JxXIwAzsLraY/nJPQtNCuFdKZu/zCstyYSR47Mmc4Ms9nU/O1Wb/C9NfAClVWCIrP0orSbGg0wvQRGjgKCdOMK6F25XyEXOBoLtTI0ogdbecrWMTpseZBlC1VlcWxfGTtDedx3UDRdX+Dycl6LbaR+2g8u91tHvRW5rZIt8J9skJAfkiJySC9IlnNx7Xz3qfOpf+Kf+edz1PcWbzbJk/J7jybizgI=</latexit>

Reverse mode is good when there are few outputs.

  • Hard to implement: execution is reversed.

x → y1 → y2 → y3 → dy3 dy2 → dy3 dy1 → dy3 dx

slide-19
SLIDE 19

Deep Learning

Deep learning involves computing the gradient of a scalar cost with respect to millions of parameters. We need reverse mode.

✓ ← ✓ − ✏@L @✓ where ✓ = (W1, W2, . . . , b1, b2, . . .)

<latexit sha1_base64="pS/V8h9CqT4Uo4yd/AVrDRTdq+g=">ACrHicbVFNb9QwEHXCVwkf3cKRi8WKqkjLNqmogANSJS4cOBRE2EraOU4k421zofsCWVl5Y/wz/gvHCykSgtI4309GbePwmbZQ0GIa/P/W7Tt37+3dDx48fPR4f3Lw5JupWy0gFrWq9UXKDShZQYwSFVw0GniZKlikmw9fEdtJF19RW3DSQlX1cyl4Kjo1aTnwLQE4PmYIcudb1JR2pV5RBY6SqK8pyzYVlDdcouaKs5FgIruynrvCDrKOMhYwhB9oLwvQDs3ezfw/U6Y5vZosYpmdLE6mVFWZDWaGU17Jv3LvOxWk2k4D4egN0E0gikZ43w1+c2yWrQlVCgUN2YZhQ0mtl9PKOgC1hpouNjwNSwdrHgJrGDhx194ZiM5rV2WSEd2KsKy0tjtmXqOvtPmOu1nvxfbdli/jaxsmpahErsHspbRbGm/UFoJjUIVFsHuNDS7UpFwZ3f6M4WsAxyd9phHZtxvVlrgKqzep121pkxC+enLsMucHZF1825CeKT+bt5+Pn19OzL6NseUaekyMSkTfkjHwk5yQmwiPeoXfshf6xH/tLP9m1+t6oeUr+CT/A1m30Bc=</latexit><latexit sha1_base64="pS/V8h9CqT4Uo4yd/AVrDRTdq+g=">ACrHicbVFNb9QwEHXCVwkf3cKRi8WKqkjLNqmogANSJS4cOBRE2EraOU4k421zofsCWVl5Y/wz/gvHCykSgtI4309GbePwmbZQ0GIa/P/W7Tt37+3dDx48fPR4f3Lw5JupWy0gFrWq9UXKDShZQYwSFVw0GniZKlikmw9fEdtJF19RW3DSQlX1cyl4Kjo1aTnwLQE4PmYIcudb1JR2pV5RBY6SqK8pyzYVlDdcouaKs5FgIruynrvCDrKOMhYwhB9oLwvQDs3ezfw/U6Y5vZosYpmdLE6mVFWZDWaGU17Jv3LvOxWk2k4D4egN0E0gikZ43w1+c2yWrQlVCgUN2YZhQ0mtl9PKOgC1hpouNjwNSwdrHgJrGDhx194ZiM5rV2WSEd2KsKy0tjtmXqOvtPmOu1nvxfbdli/jaxsmpahErsHspbRbGm/UFoJjUIVFsHuNDS7UpFwZ3f6M4WsAxyd9phHZtxvVlrgKqzep121pkxC+enLsMucHZF1825CeKT+bt5+Pn19OzL6NseUaekyMSkTfkjHwk5yQmwiPeoXfshf6xH/tLP9m1+t6oeUr+CT/A1m30Bc=</latexit><latexit sha1_base64="pS/V8h9CqT4Uo4yd/AVrDRTdq+g=">ACrHicbVFNb9QwEHXCVwkf3cKRi8WKqkjLNqmogANSJS4cOBRE2EraOU4k421zofsCWVl5Y/wz/gvHCykSgtI4309GbePwmbZQ0GIa/P/W7Tt37+3dDx48fPR4f3Lw5JupWy0gFrWq9UXKDShZQYwSFVw0GniZKlikmw9fEdtJF19RW3DSQlX1cyl4Kjo1aTnwLQE4PmYIcudb1JR2pV5RBY6SqK8pyzYVlDdcouaKs5FgIruynrvCDrKOMhYwhB9oLwvQDs3ezfw/U6Y5vZosYpmdLE6mVFWZDWaGU17Jv3LvOxWk2k4D4egN0E0gikZ43w1+c2yWrQlVCgUN2YZhQ0mtl9PKOgC1hpouNjwNSwdrHgJrGDhx194ZiM5rV2WSEd2KsKy0tjtmXqOvtPmOu1nvxfbdli/jaxsmpahErsHspbRbGm/UFoJjUIVFsHuNDS7UpFwZ3f6M4WsAxyd9phHZtxvVlrgKqzep121pkxC+enLsMucHZF1825CeKT+bt5+Pn19OzL6NseUaekyMSkTfkjHwk5yQmwiPeoXfshf6xH/tLP9m1+t6oeUr+CT/A1m30Bc=</latexit>
slide-20
SLIDE 20

Timeline

Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018

slide-21
SLIDE 21

Timeline

Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018

slide-22
SLIDE 22

Modules

  • No automatic differentiation
  • Object-oriented approach
  • Each operation a stateful object with a forward and a backward method
  • Transparent, but tedious

What if we want to connect them in more complicated ways?

slide-23
SLIDE 23

Timeline

Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018

slide-24
SLIDE 24

Graphs

  • Inspired from computer algebra systems (CAS)
  • Program is a directed acyclic graph (DAG)
  • A graph transformation automatically creates graph for derivative
  • Easy to optimize, high performance, but awkward and not very expressive

f(g(x), h(x))

<latexit sha1_base64="0CoPeENtofYF4+6iItmnkBTueJA=">ACHXicbZDLSsNAFIYn9VbrerSzWARWiglFUXdFdy4rGJtoQ1lMjlJh04mcWYilpDncFtfxpW4Fd/FhdPLQlsPnOHnP+cwP58bc6a0bX9ZuZXVtfWN/GZha3tnd6+4f/CgokRSaNGIR7LjEgWcCWhpjl0YgkdDm03eH1ZN5+AqlYJO71KAYnJIFgPqNEG8vxy0H5uVLFA/NW+sWSXbOnhZdFfS5KaF7NfvG750U0CUFoyolS3bodayclUjPKISv0EgUxoUMSQNdIQUJQTjoNneET43jYj6RpofHU/X2RklCpUeiazZDogVqcTcz/Zt1E+5dOykScaB09pGfcKwjPCGAPSaBaj4yglDJTFZMB0QSqg2nQs8D37Ccxk9IoeBZKgM3Sw2Mql07N21nBYOrvghnWbROa1c1+/as1Libc8ujI3SMyqiOLlAD3aAmaiGKHtELGqNXa2y9We/Wx2w1Z81vDtGfsj5/AGDoAw=</latexit><latexit sha1_base64="0CoPeENtofYF4+6iItmnkBTueJA=">ACHXicbZDLSsNAFIYn9VbrerSzWARWiglFUXdFdy4rGJtoQ1lMjlJh04mcWYilpDncFtfxpW4Fd/FhdPLQlsPnOHnP+cwP58bc6a0bX9ZuZXVtfWN/GZha3tnd6+4f/CgokRSaNGIR7LjEgWcCWhpjl0YgkdDm03eH1ZN5+AqlYJO71KAYnJIFgPqNEG8vxy0H5uVLFA/NW+sWSXbOnhZdFfS5KaF7NfvG750U0CUFoyolS3bodayclUjPKISv0EgUxoUMSQNdIQUJQTjoNneET43jYj6RpofHU/X2RklCpUeiazZDogVqcTcz/Zt1E+5dOykScaB09pGfcKwjPCGAPSaBaj4yglDJTFZMB0QSqg2nQs8D37Ccxk9IoeBZKgM3Sw2Mql07N21nBYOrvghnWbROa1c1+/as1Libc8ujI3SMyqiOLlAD3aAmaiGKHtELGqNXa2y9We/Wx2w1Z81vDtGfsj5/AGDoAw=</latexit><latexit sha1_base64="0CoPeENtofYF4+6iItmnkBTueJA=">ACHXicbZDLSsNAFIYn9VbrerSzWARWiglFUXdFdy4rGJtoQ1lMjlJh04mcWYilpDncFtfxpW4Fd/FhdPLQlsPnOHnP+cwP58bc6a0bX9ZuZXVtfWN/GZha3tnd6+4f/CgokRSaNGIR7LjEgWcCWhpjl0YgkdDm03eH1ZN5+AqlYJO71KAYnJIFgPqNEG8vxy0H5uVLFA/NW+sWSXbOnhZdFfS5KaF7NfvG750U0CUFoyolS3bodayclUjPKISv0EgUxoUMSQNdIQUJQTjoNneET43jYj6RpofHU/X2RklCpUeiazZDogVqcTcz/Zt1E+5dOykScaB09pGfcKwjPCGAPSaBaj4yglDJTFZMB0QSqg2nQs8D37Ccxk9IoeBZKgM3Sw2Mql07N21nBYOrvghnWbROa1c1+/as1Libc8ujI3SMyqiOLlAD3aAmaiGKHtELGqNXa2y9We/Wx2w1Z81vDtGfsj5/AGDoAw=</latexit>

What about control flow?

slide-25
SLIDE 25

Timeline

Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018

slide-26
SLIDE 26

Operator overloading

def f(x): i = 0 while i < 3: i = i + 1 x = tanh(x) x = x * 10 return x i = 0 i = i + 1 x = tanh(x) i = i + 1 x = tanh(x) i = i + 1 x = tanh(x) x = x * 10

Trace Backprop

  • Overload every operation to log itself on a tape.
  • At the end, we walk the tape backward.
  • “Define-by-run”, “Dynamic graph”
  • Easy to implement, but lots of overhead
  • Discourages composing small & cheap operations

Program Tape

slide-27
SLIDE 27

Timeline

Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018

slide-28
SLIDE 28

Source code transformation

  • Transform a function that computes a value into a new function that

computes the derivative.

  • Operate on source code or intermediate representation
  • Standard language optimizations apply.
  • Two approaches:
  • Tape-based
  • Functional AD
slide-29
SLIDE 29

SCT: Tape-based

  • Static version of operator overloading
  • Forward pass computes value and pushes data on tape
  • Backward pass pops data from tape and computes gradient
  • Push and pop instructions are inserted in the code
  • Main approach used by SCT AD frameworks
  • Tapenade (C/Fortran), Tangent (Python)
  • Issues
  • The tape is global state
  • Hard to support higher-order derivatives
  • Has to deal with push&pop
slide-30
SLIDE 30

SCT: Functional AD

  • Works in pure functional languages
  • Loops = tail recursion (no stack growth)
  • Transform a function by making it return (output, backpropagator)
  • Bprop takes gradient wrt output, returns gradients wrt inputs
  • Bprop has access to original variables as free variables
  • Closures are chained together in a sort of “tape”
  • Higher-order derivatives
  • Stalingrad (Scheme), DiffSharp (F#)

See: Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator (Pearlmutter & Siskind, 2008)

slide-31
SLIDE 31

Paradigm

How deep learning and language design intersect

Autodiff

What it is. How it works

Myia

Our proposed solution

slide-32
SLIDE 32

Reminder: Needs

General purpose: Express complex compositions with control flow. Fast: Leverage parallelism and GPU to process millions of features. Portable: Serializable, support multiple hardware. Goal: a language adapted to the needs of machine learning, past and future Differentiable: Language support for gradient descent.

slide-33
SLIDE 33

Comparison

General Fast Portable Differentiable TensorFlow (graph) No ✔ ✔ Partially PyTorch (overloading) ✔ Partially Partially (Tracing) ✔ Tangent (SCT) ✔ Partially Python-specific ✔ Myia (SCT) ✔ ✔ ✔ ✔

slide-34
SLIDE 34

Pipeline

slide-35
SLIDE 35

Representation

  • Functional
  • Function abstraction,

recursion, closures

  • No side-effects
  • Graph representation
  • Represent data flow directly
  • Flexible scheduling (unlike SSA or CPS)
  • Independent operations can run in parallel
  • See: Sea of Nodes (Click 1995)
  • Direct pointers to free variables
  • See: Thorin (Leißa, 2015)

def fact(x): if x <> 1: return 1 else: return x * fact(x - 1)

Output Operation Input Constant

slide-36
SLIDE 36

Parsing Python

def pow(x, n): r = 1 while n > 0: r = r * x n = n - 1 return r

  • Parser based on SSA converter, adapted for functional representation
  • Conditional branches become thunks (functions with no arguments)
  • Loops converted to tail recursions
slide-37
SLIDE 37

Type System

  • Inference: unification-based
  • Python interface: infer from types of arguments on Python side
  • Recompile when types change
  • Fundamental types:
  • Scalars: Int/UInt/Float<8/16/32/64>, Bool, Char
  • Tuple<T1, T2, ../>
  • Struct<field1=T1, field2=T2, ../>
  • List<T>
  • Linked list, efficient append
  • NDArray<T, Shape<D1, D2, ../>?
  • D1, D2, ... need not be known statically
  • Function<Args<TIn1, TIn2, ../>, TOut> (Function types)
slide-38
SLIDE 38

Primitives

  • Scalar primitives
  • Usual arithmetic: add, mul, div, ../
  • Commonly used: log, exp, ../
  • Numeric stability: log1p, expm1, ../
  • Array primitives
  • broadcast, gemm, ../
  • “Higher-order” primitives
  • Apply a scalar-typed function on arrays following a pattern
  • map, reduce
  • Implicitly parallel: can be run efficiently on GPU
slide-39
SLIDE 39

Automatic differentiation

def f(x, y): a = x *+ 3 b = y *+ 4 c = a * b return c

IR grad(f) grad(grad(f))

slide-40
SLIDE 40

Optimization

  • AD transform creates a lot of “junk”
  • Tuple packing and unpacking
  • Unnecessary gradients
  • Aggressive inlining of primitive adjoints and pattern-matching
  • ptimizations (peephole) can go a long way
  • Stability optimizations
  • Stabilize simple patterns
  • Can be applied prior to AD

def f(x, y): a = x *+ 3 b = y *+ 4 c = a * b return c

slide-41
SLIDE 41

Backend(s)

  • Use existing backends to generate GPU kernels
  • NNVM, XLA, Tensor Compr., TensorRT, NGraph, …
  • Offload many optimizations to them (e.g. loop fusion)
  • Leverage man-centuries of efforts
  • But: not fully general (no recursion)
  • Custom VM for general control flow
  • Higher-level
  • Python-based VM (for debugging)
  • LLVM backend (for production)
slide-42
SLIDE 42

Conclusion

  • Roadmap to beta
  • Support gradient operator in type system
  • Improve optimization of the gradient
  • Finalize support for array primitives and broadcasting
  • Finalize support for backend

Myia aims to be a language adapted to the needs of machine learning, past and future:

  • General purpose through support for recursion
  • Fast and portable through NNVM backend and general purpose VM
  • Differentiable using functional source code transformation
slide-43
SLIDE 43

Thanks!

https://github.com/mila-udem/myia

Follow our progress!

is hiring!

Want to work at the confluence of academia and industry? We seek:

*

Professors

*

Software engineers

*

Director of software

*

R&D & technology transfer

*

Linux sysadmins

https://tinyurl.com/mila-jobs

Plus: free French classes!