Institut des algorithmes d’apprentissage de Montréal
Myia: A Differentiable Language for Deep Learning
Olivier Breuleux
Computer Analyst, MILA
Bart van Merriënboer (MILA, Google) Arnaud Bergeron (MILA)
Myia: A Differentiable Language for Deep Learning Olivier Breuleux - - PowerPoint PPT Presentation
Institut des algorithmes dapprentissage de Montral Myia: A Differentiable Language for Deep Learning Olivier Breuleux Computer Analyst, MILA Bart van Merrinboer (MILA, Google) Arnaud Bergeron (MILA) Paradigm How deep learning and
Institut des algorithmes d’apprentissage de Montréal
Olivier Breuleux
Computer Analyst, MILA
Bart van Merriënboer (MILA, Google) Arnaud Bergeron (MILA)
How deep learning and language design intersect
What it is. How it works
Our proposed solution
How deep learning and language design intersect
What it is. How it works
Our proposed solution
(Elgammal & al., 2017)
?
? Data Trained structure Gradient descent Initial structure (automatable)
General purpose: Express complex compositions using control flow. Fast: Leverage parallelism and GPU to process millions of features. Portable: Serializable, support multiple hardware. Goal: a language adapted to the needs of machine learning, past and future Differentiable: Language support for gradient descent.
DL algorithms are increasingly complex Feedforward (trivial) Recurrent (loops) Recursive (recursion)
DL algorithms are increasingly complex
production
Python/numpy)
How deep learning and language design intersect
What it is. How it works
Our proposed solution
✏!0
✏!0
∂f1 ∂x1
∂f1 ∂xm
∂fn ∂x1
∂fn ∂xm
n⇥m
n⇥q
q⇥p
p⇥m
<latexit sha1_base64="bA4x/LBIAvRyVfls/ao3G8bvaiM=">ADSXicpVLNa9RAFJ9k/ajr17YevQwuwvayJKWgHoSCF+mpimsLmyVOJi+boZNJOvMiXYb8fV56s0/wosHFU9O0oj2Qyj4YHg/3sfv/d5jkoKg0Hw2fMHN27eur12Z3j3v0HD0frG+9NWsOM17KUh8kzIAUCmYoUMJBpYEViYT95PBVm9/CNqIUr3DVQWLgi2VyARn6ELxuvchqlUKOtGMg40KhnmS2d3Y5jTiQnO67H3WTI43mya2ikYoCjC0aOjLYVTlTGFZ2CgpZWpWhXM2QjGTpvVkDaTphlePSWfrOKt86xHzX9yLh1n2HEe/easrse5+U/OrN+9+rP7tRnj0TiYBp3RyDswZj0thePTqO05HUBCrlkxszDoMKFZRoFl9BKNFAxfsiWMHdQMSdoYbuhDX3qIinNSu2eQtpF/+6wrDCtVlfZbmcu5trgVbl5jdnzhRWqhEUPxuU1ZJiSdt/RVOhgaNcOcC4Fk4r5Tlzd0T3+9ojhBdXvgxmW9MX0+DN9njnbX+NfKYPCETEpJnZIe8JntkRrj3yfvifO+yf+V/+H/Os1Pf6nkfknA0GvwDrPhlM</latexit><latexit sha1_base64="bA4x/LBIAvRyVfls/ao3G8bvaiM=">ADSXicpVLNa9RAFJ9k/ajr17YevQwuwvayJKWgHoSCF+mpimsLmyVOJi+boZNJOvMiXYb8fV56s0/wosHFU9O0oj2Qyj4YHg/3sfv/d5jkoKg0Hw2fMHN27eur12Z3j3v0HD0frG+9NWsOM17KUh8kzIAUCmYoUMJBpYEViYT95PBVm9/CNqIUr3DVQWLgi2VyARn6ELxuvchqlUKOtGMg40KhnmS2d3Y5jTiQnO67H3WTI43mya2ikYoCjC0aOjLYVTlTGFZ2CgpZWpWhXM2QjGTpvVkDaTphlePSWfrOKt86xHzX9yLh1n2HEe/easrse5+U/OrN+9+rP7tRnj0TiYBp3RyDswZj0thePTqO05HUBCrlkxszDoMKFZRoFl9BKNFAxfsiWMHdQMSdoYbuhDX3qIinNSu2eQtpF/+6wrDCtVlfZbmcu5trgVbl5jdnzhRWqhEUPxuU1ZJiSdt/RVOhgaNcOcC4Fk4r5Tlzd0T3+9ojhBdXvgxmW9MX0+DN9njnbX+NfKYPCETEpJnZIe8JntkRrj3yfvifO+yf+V/+H/Os1Pf6nkfknA0GvwDrPhlM</latexit><latexit sha1_base64="bA4x/LBIAvRyVfls/ao3G8bvaiM=">ADSXicpVLNa9RAFJ9k/ajr17YevQwuwvayJKWgHoSCF+mpimsLmyVOJi+boZNJOvMiXYb8fV56s0/wosHFU9O0oj2Qyj4YHg/3sfv/d5jkoKg0Hw2fMHN27eur12Z3j3v0HD0frG+9NWsOM17KUh8kzIAUCmYoUMJBpYEViYT95PBVm9/CNqIUr3DVQWLgi2VyARn6ELxuvchqlUKOtGMg40KhnmS2d3Y5jTiQnO67H3WTI43mya2ikYoCjC0aOjLYVTlTGFZ2CgpZWpWhXM2QjGTpvVkDaTphlePSWfrOKt86xHzX9yLh1n2HEe/easrse5+U/OrN+9+rP7tRnj0TiYBp3RyDswZj0thePTqO05HUBCrlkxszDoMKFZRoFl9BKNFAxfsiWMHdQMSdoYbuhDX3qIinNSu2eQtpF/+6wrDCtVlfZbmcu5trgVbl5jdnzhRWqhEUPxuU1ZJiSdt/RVOhgaNcOcC4Fk4r5Tlzd0T3+9ojhBdXvgxmW9MX0+DN9njnbX+NfKYPCETEpJnZIe8JntkRrj3yfvifO+yf+V/+H/Os1Pf6nkfknA0GvwDrPhlM</latexit>The derivative of a straight composition of functions is the product of their Jacobians
n×q
q×p
n×p
p×m
<latexit sha1_base64="JaQOtQ2WAS20Br+08aKQTmorcBM=">ADGXiclZJLb9QwEMed8Crh0S0cuViskLaXVIhUW6VuCBOBbG0mYVOc5k16rtpPYEdWXlc3DpV+mFAyCOcOLb4KRb0QcgGMny3zOe34xHzmspLMbxjyC8dv3GzVtrt6M7d+/dXx9sPHhnq8ZwmPBKVmY/Zxak0DBgRL2awNM5RL28oMXzvPRgrKv0WlzXMFJtrUQrO0LuyjSCOXJpXsrBL5TeXIhxhj3UGinbUtlHa6AJMbhgHd+GQKoaLvHSvsVomW1tm3mNE1RKLD0CfWC6axUv9T4Bdz7plJzw8Y9ZtdL6EP/4NvflHdDk6sH1GUn9W7MdMcoGw3gc90avimQlhmRlu9ngW1pUvFGgkUtm7TSJa5w5ZlBwCV2PFmrGD9gcpl5q5juaub5qS594T0HLyvilkfbe8xmOKds16292z7OXY53zd7Fpg+X2zAldNwianxYqG0mxot0/oYUwFEuvWDcCN8r5QvmB4n+N3VDSC4/+aqYbI2fj+PXT4c7b1bTWCOPyGMyIgl5RnbIS7JLJoQH4KT4FPwOTwOP4Zfwq+nV8NglfOQXLDw+089SAWq</latexit><latexit sha1_base64="JaQOtQ2WAS20Br+08aKQTmorcBM=">ADGXiclZJLb9QwEMed8Crh0S0cuViskLaXVIhUW6VuCBOBbG0mYVOc5k16rtpPYEdWXlc3DpV+mFAyCOcOLb4KRb0QcgGMny3zOe34xHzmspLMbxjyC8dv3GzVtrt6M7d+/dXx9sPHhnq8ZwmPBKVmY/Zxak0DBgRL2awNM5RL28oMXzvPRgrKv0WlzXMFJtrUQrO0LuyjSCOXJpXsrBL5TeXIhxhj3UGinbUtlHa6AJMbhgHd+GQKoaLvHSvsVomW1tm3mNE1RKLD0CfWC6axUv9T4Bdz7plJzw8Y9ZtdL6EP/4NvflHdDk6sH1GUn9W7MdMcoGw3gc90avimQlhmRlu9ngW1pUvFGgkUtm7TSJa5w5ZlBwCV2PFmrGD9gcpl5q5juaub5qS594T0HLyvilkfbe8xmOKds16292z7OXY53zd7Fpg+X2zAldNwianxYqG0mxot0/oYUwFEuvWDcCN8r5QvmB4n+N3VDSC4/+aqYbI2fj+PXT4c7b1bTWCOPyGMyIgl5RnbIS7JLJoQH4KT4FPwOTwOP4Zfwq+nV8NglfOQXLDw+089SAWq</latexit><latexit sha1_base64="JaQOtQ2WAS20Br+08aKQTmorcBM=">ADGXiclZJLb9QwEMed8Crh0S0cuViskLaXVIhUW6VuCBOBbG0mYVOc5k16rtpPYEdWXlc3DpV+mFAyCOcOLb4KRb0QcgGMny3zOe34xHzmspLMbxjyC8dv3GzVtrt6M7d+/dXx9sPHhnq8ZwmPBKVmY/Zxak0DBgRL2awNM5RL28oMXzvPRgrKv0WlzXMFJtrUQrO0LuyjSCOXJpXsrBL5TeXIhxhj3UGinbUtlHa6AJMbhgHd+GQKoaLvHSvsVomW1tm3mNE1RKLD0CfWC6axUv9T4Bdz7plJzw8Y9ZtdL6EP/4NvflHdDk6sH1GUn9W7MdMcoGw3gc90avimQlhmRlu9ngW1pUvFGgkUtm7TSJa5w5ZlBwCV2PFmrGD9gcpl5q5juaub5qS594T0HLyvilkfbe8xmOKds16292z7OXY53zd7Fpg+X2zAldNwianxYqG0mxot0/oYUwFEuvWDcCN8r5QvmB4n+N3VDSC4/+aqYbI2fj+PXT4c7b1bTWCOPyGMyIgl5RnbIS7JLJoQH4KT4FPwOTwOP4Zfwq+nV8NglfOQXLDw+089SAWq</latexit>n×q
q×p
p×m
q×m
Forward mode is good when there are few inputs.
Reverse mode is good when there are few outputs.
Deep learning involves computing the gradient of a scalar cost with respect to millions of parameters. We need reverse mode.
Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018
Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018
What if we want to connect them in more complicated ways?
Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018
What about control flow?
Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018
def f(x): i = 0 while i < 3: i = i + 1 x = tanh(x) x = x * 10 return x i = 0 i = i + 1 x = tanh(x) i = i + 1 x = tanh(x) i = i + 1 x = tanh(x) x = x * 10
Trace Backprop
Program Tape
Modules SN, Torch, Caffe 1987 2002 2013 Operator overloading Autograd, Chainer, PyTorch 2014 2015 2017 Graphs Theano, TensorFlow, MXNet 2008 2015 2015 Source code transform Tangent, Myia 2017 2018
computes the derivative.
See: Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator (Pearlmutter & Siskind, 2008)
How deep learning and language design intersect
What it is. How it works
Our proposed solution
General purpose: Express complex compositions with control flow. Fast: Leverage parallelism and GPU to process millions of features. Portable: Serializable, support multiple hardware. Goal: a language adapted to the needs of machine learning, past and future Differentiable: Language support for gradient descent.
General Fast Portable Differentiable TensorFlow (graph) No ✔ ✔ Partially PyTorch (overloading) ✔ Partially Partially (Tracing) ✔ Tangent (SCT) ✔ Partially Python-specific ✔ Myia (SCT) ✔ ✔ ✔ ✔
recursion, closures
def fact(x): if x <> 1: return 1 else: return x * fact(x - 1)
Output Operation Input Constant
def pow(x, n): r = 1 while n > 0: r = r * x n = n - 1 return r
def f(x, y): a = x *+ 3 b = y *+ 4 c = a * b return c
IR grad(f) grad(grad(f))
def f(x, y): a = x *+ 3 b = y *+ 4 c = a * b return c
Myia aims to be a language adapted to the needs of machine learning, past and future:
Follow our progress!
Want to work at the confluence of academia and industry? We seek:
*
Professors
*
Software engineers
*
Director of software
*
R&D & technology transfer
*
Linux sysadmins
https://tinyurl.com/mila-jobs
Plus: free French classes!