Accelerated machine-learning research via composable function - PowerPoint PPT Presentation

Accelerated machine-learning research via composable function transformations in Python mattjj@ frostig@ leary@ dougalm@ phawkins@ skyewm@ jekbradbury@ necula@ ...@google.com

What is JAX import jax .numpy as np from jax import jit, grad, vmap def predict(params, inputs): for W, b in params: outputs = np.dot(inputs, W) + b inputs = np.tanh(outputs) return outputs def loss(params, batch): inputs, targets = batch preds = predict(params, inputs) return np.sum((preds - targets) ** 2) JAX is an extensible system for gradient_fun = jit ( grad (loss)) composable function transformations perexample_grads = jit ( vmap ( grad (loss), (None, 0))) of Python+NumPy code.

You can use JAX for free on Cloud TPUs in Colab! bit.ly/jax-tpu (github.com/google/jax/tree/master/cloud_tpu_colabs) Wave simulation from the “Wave Equation” notebook Try it today! : D

How JAX works

Step 1: Python function → JAX IR def f(x): return x + 2 class EspressoDelegator(object): def __add__(self, num_espressos): subprocess.popen(["ssh", ...])

Step 1: Python function → JAX IR def f(x ::f32 ): return x + 2

Step 1: Python function → JAX IR def f(x): return x + 2 How does f behave on... ShapedArray(f32, (3,)) ShapedArray(f32, (2, 2)) ConcreteArray(f32, [[1., 2.], [3., 4.]]) Abstract value

Step 1: Python function → JAX IR from jax import lax def log2(x): ln_x = lax.log(x) ln_2 = lax.log(2) return ln_x / ln_2

Step 1: Python function → JAX IR Calls to JAX primitive operations , from jax import lax the elementary operations we know how to transform. def log2(x): ln_x = lax.log (x) ln_2 = lax.log (2) return ln_x / ln_2

Step 1: Python function → JAX IR from jax import lax def log2(x): ln_x = lax.log(x) ln_2 = lax.log(2) return ln_x / ln_2 x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR from jax import lax Replace argument x with a def log2( x ): special tracer object ln_x = lax.log(x) ln_2 = lax.log(2) return ln_x / ln_2 x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR from jax import lax { lambda ; ; a. def log2(x): let b = log a ln_x = lax.log(x) ln_2 = lax.log(2) return ln_x / ln_2 x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR from jax import lax { lambda ; ; a. def log2(x): let b = log a ln_x = lax.log(x) ln_2 = lax.log(2) # ln_2 = 0.693147 return ln_x / ln_2 Trace doesn’t include log(2) because no data dependence on tracer object x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR from jax import lax { lambda ; ; a. def log2(x): let b = log a ln_x = lax.log(x) c = div b 0.693147 ln_2 = lax.log(2) return ln_x / ln_2 x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR from jax import lax { lambda ; ; a. def log2(x): let b = log a ln_x = lax.log(x) c = div b 0.693147 ln_2 = lax.log(2) in [c] } return ln_x / ln_2 x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR Behavior not from jax import lax captured by jaxpr! { lambda ; ; a. def log2(x): let b = log a global_list.append(x) c = div b 0.693147 ln_x = lax.log(x) in [c] } ln_2 = lax.log(2) return ln_x / ln_2 Traced function must be pure (no side effects visible outside the function, x = np.array(...) output fully determined by input) y = jit(log2)(x)

Step 1: Python function → JAX IR from jax import lax { lambda ; ; a. def log2(x): let b = log a ln_x = lax.log(x) c = div b 0.693147 ln_2 = lax.log(2) in [c] } return ln_x / ln_2 x = np.array(...) y = jit(log2)(x)

Step 1: Python function → JAX IR jit(f)(0.) def f(x): { lambda ; ; a. if x.ndim == 0 : let b = pow a 3.0 return 2*x**3. c = mul b 2.0 else: in [c] } return 3*x jit(f)(np.ones(4.)) { lambda ; ; a. let b = mul a 3.0 in [b] }

Step 1: Python function → JAX IR jit(f)(0.) def f(x): TypeError: Abstract value passed to if x > 0 : # ERROR! `bool`, which requires a concrete value. return 2*x**3. else: return 3*x

Step 1: Python function → JAX IR grad (f)(1.) def f(x): { lambda ; ; a. if x > 0 : let b = pow a 3.0 return 2*x**3. c = mul b 2.0 else: in [c] } return 3*x grad (f)(-1.) { lambda ; ; a. let b = mul a 3.0 in [b] }

Step 1: Python function → JAX IR ⊤ ↑ # no control flow allowed ... Unshaped(f32) ... z = cos(x + y) ↑ # can branch on shape jit, → ... Shaped(f32, (2,2)) ... if x.shape[0] > 2: ... vmap for subarray in array: ... ↑ grad → # can branch on value if x.val != 0 ... EpsilonBall(f32,[[1.,2.],[3.,4.]]) ... if x > 0: ... ↑ eval → # can always branch on value ... Concrete(f32,[[1.,2.],[3.,4.]]) ... if x > 0: ... ↑ ⊥

Step 2: transform jaxpr { lambda ; ; a. let b = log a c = div b 0.693147 in [c] }

Step 2: transform jaxpr def log_jvp(x, t): { lambda ; ; a. return lax.div(t, x) let b = log a c = div b 0.693147 in [c] } def div_jvp(x, y, tx, ty): return (ty / y, -x * ty / y**2) Every transform has a rule for every primitive

Step 2: transform jaxpr def jvp_transform (jaxpr, x, t): env = {jaxpr.invar: (x, t)} { lambda ; ; a. for eqn in jaxpr.eqns: let b = log a rule = jvp_rules[eqn.prim] c = div b 0.693147 xs, ts = zip(*[env[v] for v in eqn.ins]) in [c] } env[eqn.out] = rule(xs, ts) return env[jaxpr.outvar] Transform itself is a simple jaxpr interpreter

Step 2: transform jaxpr Replace arguments with tracer objects def jvp_transform (jaxpr, x , t ): env = {jaxpr.invar: (x, t)} { lambda ; ; a. for eqn in jaxpr.eqns: let b = log a rule = jvp_rules[eqn.prim] c = div b 0.693147 xs, ts = zip(*[env[v] for v in eqn.ins]) in [c] } env[eqn.out] = rule(xs, ts) return env[jaxpr.outvar] { lambda ; ; a b. let c = log a d = div c 0.693147 e = div b a f = div e 0.693147 in [d, f] }

trace trace + transform eval compile Python function Jaxpr transform

Why researchers like JAX 1. JAX is easy to use ○ Minimal + expressive API (NumPy + function transformations) ○ Can understand “what it’s doing” ○ Same API for CPU/GPU/TPU 2. JAX is fast ○ Good performance out-of-the-box ○ Simple parallelization model (pmap) 3. Robust and powerful transformations 4. Functional programming model ○ Aligns well with math ○ Reproducible results ○ Easier to debug ○ The key to JAX’s superpowers

Current limitations 1. Limited higher-level libraries for layers/models ○ Stay tuned! 2. Per-op dispatch overhead not fully optimized ○ Solution 1: keep optimizing ○ Solution 2: more jit 3. Transforms only work on pure functions ○ User-promised

“Eager-mode” performance with jit Composable jit means we can write readable and efficient library code. def adam(step_size, b1=0.9, b2=0.999, eps=1e-8): ... @jit def update(i, g, state): x, m, v = state m = (1 - b1) * g + b1 * m v = (1 - b2) * (g ** 2) + b2 * v mhat = m / (2 - b1 ** (i + 1)) vhat = v / (2 - b2 ** (i + 1)) x = x - step_size(i) * mhat / (np.sqrt(vhat) + eps) return x, m, v All computations are JIT-compiled with XLA. JAX has almost no handwritten kernels.

Current limitations 1. Limited higher-level libraries for layers/models ○ Stay tuned! 2. Per-op dispatch overhead not fully optimized ○ Solution 1: keep optimizing ○ Solution 2: more jit 3. Transforms only work on pure functions ○ User-promised

Many projects are already using JAX! 1. Studying neural net training with advanced autodiff ○ neural-tangents : experiments with the Neural Tangent Kernel ○ spectral-density : estimating loss function Hessian spectra 2. Algorithms for robotics and control ○ asynchronous model-predictive control 3. Bayesian models and inference ○ NumPyro : probabilistic programming and NUTS 4. Simulation and science ○ jax-md : differentiable, hardware-accelerated molecular dynamics for physics ○ Time Machine : molecular dynamics for biology with meta-optimization ○ comp-thru-dynamics : dynamics in artificial and biological neural systems 5. Large-scale neural network training ○ trax : Tensor2Tensor in JAX

Thank you! : D github.com/google/jax Demo: bit.ly/jax-tpu Stickers!

Accelerated machine-learning research via composable function - PowerPoint PPT Presentation

Accelerated machine-learning research via composable function transformations in Python mattjj@ frostig@ leary@ dougalm@ phawkins@ skyewm@ jekbradbury@ necula@ ...@google.com What is JAX import jax .numpy as np from jax import jit, grad,

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE &

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Lax-algebraic theories and closed objects Dirk Hofmann University of Aveiro dirk@mat.ua.pt 1

VI. The Feasibility Study VI. The Feasibility Study What is a feasibility study? What is a

Data-Dependent Sample Complexities for Deep Neural Networks Tengyu Ma Colin Wei Stanford

Some Second Order Set Theory Joel David Hamkins The City University of New York The College of

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Integrability of pentagram maps and Lax representations Fedor Soloviev, joint with Boris Khesin

Categories of lax fractions Lurdes Sousa IPV / CMUC June 18, 2015 Category Theory 2015, Aveiro,

A Coq - definition al implementation of the Lax Logical Framework LLF P , for fast and loose

Sambuz

Useful Links

Newsletter

Mail Us

Accelerated machine-learning research via composable function - PowerPoint PPT Presentation

Accelerated machine-learning research via composable function transformations in Python mattjj@ frostig@ leary@ dougalm@ phawkins@ skyewm@ jekbradbury@ necula@ ...@google.com What is JAX import jax .numpy as np from jax import jit, grad,

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE &amp; EXTENSIBLE A FLEXIBLE, COMPOSABLE &amp;

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Fused and Composable Heterogeneous Cores Roshan Nair and Anirudh Krishna Villivalam Single cores

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Lax-algebraic theories and closed objects Dirk Hofmann University of Aveiro dirk@mat.ua.pt 1

VI. The Feasibility Study VI. The Feasibility Study What is a feasibility study? What is a

Data-Dependent Sample Complexities for Deep Neural Networks Tengyu Ma Colin Wei Stanford

Some Second Order Set Theory Joel David Hamkins The City University of New York The College of

Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical

Integrability of pentagram maps and Lax representations Fedor Soloviev, joint with Boris Khesin

Categories of lax fractions Lurdes Sousa IPV / CMUC June 18, 2015 Category Theory 2015, Aveiro,

A Coq - definition al implementation of the Lax Logical Framework LLF P , for fast and loose

Sambuz

Useful Links

Newsletter

Mail Us

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE &