ifferentiate everything a lesson from deep learning lei
play

ifferentiate everything: A lesson from deep learning Lei Wang ( ) - PowerPoint PPT Presentation

ifferentiate everything: A lesson from deep learning Lei Wang ( ) https://wangleiphy.github.io Institute of Physics, CAS Quantum Many-Body Computation U Deep Quantum Learning Computing Differentiable Programming


  1. ∂ ifferentiate everything: A lesson from deep learning Lei Wang ( 王磊 ) https://wangleiphy.github.io Institute of Physics, CAS

  2. Quantum Many-Body Computation U Deep Quantum Learning Computing Differentiable Programming

  3. Differentiable Programming Andrej Karpathy Traditional Director of AI at Tesla. Previously Research Scientist at OpenAI and PhD student at Stanford. I like to train deep neural nets on large datasets. https://medium.com/@karpathy/software-2-0-a64152b37c35 Input Computer Output Program Machine Learning Input Computer Program Output Writing software 2.0 by gradient search in the program space

  4. Differentiable Programming Andrej Karpathy Benefits of Software 2.0 Director of AI at Tesla. Previously Research Scientist at OpenAI and PhD student at Stanford. I like to train deep neural nets on large datasets. https://medium.com/@karpathy/software-2-0-a64152b37c35 • Computationally homogeneous • Simple to bake into silicon • Constant running time • Constant memory usage • Highly portable & agile • Modules can meld into an optimal whole • Better than humans Writing software 2.0 by gradient search in the program space

  5. Demo: Inverse Schrodinger Problem Given ground state density, how to design the potential ? [ − 1 ∂ x 2 + V ( x ) ] Ψ ( x ) = E Ψ ( x ) ∂ 2 2 https://math.mit.edu/~stevenj/18.336/adjoint.pdf https://github.com/QuantumBFS/SSSS/blob/master/1_deep_learning/schrodinger.py

  6. What is under the hood ?

  7. What is deep learning ? Composes differentiable components to a program e.g. a neural network, then optimizes it with gradients

  8. Automatic differentiation on computation graph θ 1 θ 2 weights ∂ x 2 ∂ x 3 θ 1 = x 2 θ 2 = x 3 ∂ θ 1 ∂ θ 2 ℒ “comb graph“ loss x 3 x 2 x 1 ∂ x 3 x 3 = ℒ ∂ℒ ℒ = 1 x 2 = x 3 data ∂ x 2 ∂ x 3 x = ∂ℒ “adjoint variable” ∂ x Pullback the adjoint through the graph

  9. Automatic differentiation on computation graph x 1 x 2 ∂ x 2 x 1 = x 2 ∂ x 1 directed ℒ θ ∂ x 3 + x 3 acyclic graph ∂ x 1 x 3 ∂ x j ∑ x i = with ℒ = 1 x j ∂ x i j : child of i Message passing for the adjoint at each node

  10. Advantages of automatic differentiation � � � • Accurate to the machine precision � • Same computational complexity as the function evaluation: � Baur-Strassen theorem ’83 � • Supports higher order gradients

  11. Applications of AD Computing force Quantum optimal control Variational Hartree-Fock forward (evolution) backward (gradient) u 1 , 1 u 2 , 1 u 3 , 1 u 1 , 2 u 2 , 2 u 3 , 2 C × H 1 × H 2 × H 3 × H 1 × H 2 × H 3 H H + + Ψ C + 2 0 0 T e − i δ t e − i δ t Ψ 0 Ψ Ψ Ψ 2 1 N Ψ C 5 Ψ C 5 C 5 Ψ F F F 0 + + + Sorella and Capriotti Leung et al Tamayo-Mendoza et al J. Chem. Phys. ’10 PRA ’17 ACS Cent. Sci. ’18

  12. More Applications… Langevin dynamics Sequence s Impute Structure X . . . Ingraham et al Protein Initialize . . . ICLR ‘19 folding Neural reparameterization Structural optimization Dense Conv Conv Conv Conv Structural Hoyer et al 1909.04240 optimization Physics Objective function CNN parameterization Design constraints (displacement) (compliance) Forward pass Gradients

  13. Coil design in fusion reactors (stellarator) McGreivy et al 2009.00196

  14. Computation graph McGreivy et al 1909.04240 Coil parameters Total cost Differentiable programming is more than training neural networks

  15. Black Functional Chain magic box differential geometry rule https://colab.research.google.com/ github/google/jax/blob/master/ notebooks/autodiff_cookbook.ipynb Differentiating a general computer program (rather than neural networks) calls for deeper understanding of the technique

  16. Reverse versus forward mode ∂ x n ⋯ ∂ x 2 ∂ x 1 ∂ℒ ∂ θ = ∂ℒ ∂ x n ∂ x n − 1 ∂ x 1 ∂ θ Reverse mode AD: Vector-Jacobian Product of primitives • Backtrace the computation graph • Needs to store intermediate results • Efficient for graphs with large fan-in Backpropagation = Reverse mode AD applied to neural networks

  17. Reverse versus forward mode ∂ x n ⋯ ∂ x 2 ∂ x 1 ∂ℒ ∂ θ = ∂ℒ ∂ x n ∂ x n − 1 ∂ x 1 ∂ θ Forward mode AD: Jacobian-Vector Product of primitives • Same order with the function evaluation • No storage overhead • Efficient for graph with large fan-out Less efficient for scalar output, but useful for higher-order derivatives

  18. How to think about AD ? • AD is modular, and one can control its granularity • Benefits of writing customized primitives • Reducing memory usage • Increasing numerical stability • Call to external libraries written agnostically to AD (or, even a quantum processor) https://github.com/PennyLaneAI/pennylane

  19. Example of primitives ~200 functions to cover most of numpy in HIPS/autograd https://github.com/HIPS/autograd/blob/master/autograd/numpy/numpy_vjps.py � … Loop/Condition/Sort/Permutations are also differentiable

  20. Differentiable programming tools HIPS/autograd SciML

  21. Differentiable Scientific Computing • Many scientific computations (FFT, Eigen, SVD!) are differentiable • ODE integrators are differentiable with O(1) memory • Differentiable ray tracer and Differentiable fluid simulations • Differentiable Monte Carlo/Tensor Network/Functional RG/ Dynamical Mean Field Theory/Density Functional Theory/ Hartree-Fock/Coupled Cluster/Gutzwiller/Molecular Dynamics… Differentiate through domain-specific computational processes to solve learning, control, optimization and inverse problems

  22. Differentiable Eigensolver Inverse Schrodinger Problem H Ψ ℒ V matrix diagonalization Useful for inverse Kohn-Sham problem, Jensen & Wasserman ‘17

  23. Differentiable Eigensolver H Ψ = Ψ E What happen if H → H + dH Forward mode: ? Perturbation theory Reverse mode: How should I change H given Inverse perturbation theory! and ? ∂ℒ / ∂Ψ ∂ℒ / ∂ E Hamiltonian engineering via differentiable programming https://github.com/wangleiphy/DL4CSRC/tree/master/2-ising See also Fujita et al, PRB ‘18

  24. Differentiable ODE integrators “Neural ODE” Chen et al, 1806.07366 Dynamics systems Principle of least actions S = ∫ ℒ ( q θ , · dx q θ , t ) dt dt = f θ ( x , t ) Classical and quantum control Optics, (quantum) mechanics, field theory…

  25. i dU Quantum optimal control dt = HU https://qucontrol.github.io/krotov/ v1.0.0/11_other_methods.html Differentiable programing (Neural ODE) for unified, flexible, and efficient quantum control Forward mode: slow No gradient: Reverse mode w/ discretize steps: not scalable piesewise-constant assumption

  26. Differentiable ODE integrators “Neural ODE” Chen et al, 1806.07366 Dynamics systems Principle of least actions S = ∫ ℒ ( q θ , · dx q θ , t ) dt dt = f θ ( x , t ) Classical and quantum control Optics, (quantum) mechanics, field theory…

  27. Differentiable functional optimization The brachistochrone problem Johann Bernoulli,1696 T = ∫ x 1 1 + ( dy / dx ) 2 2 g ( y 1 − y 0 ) dx x 0 https://github.com/QuantumBFS/SSSS/tree/master/1_deep_learning/brachistochrone

  28. Differentiable Programming Tensor Networks Liao, Liu, LW, Xiang, 1903.09650, PRX ‘19 https://github.com/wangleiphy/tensorgrad

  29. “Tensor network is 21 century’s matrix” —Mario Szegedy Ψ Quantum circuit architecture, Neural networks and parametrization, and simulation Probabilistic graphical models

  30. Differentiate through tensor renormalization group Computation graph × depth ln Z β � � � � � Contraction � Truncated SVD inverse free Levin, Nave, PRL ‘07 temperature energy 3 . 0 2 . 20 β 2 ∂ 2 ln Z − 1 − ∂ ln Z β ln Z � � ∂β ∂β 2 − 1 . 2 � � 2 . 5 exact exact exact energy density � specific heat free energy � � 2 . 15 � 2 . 0 − 1 . 4 � � 1 . 5 2 . 10 � − 1 . 6 1 . 0 � 2 . 05 0 . 40 0 . 45 0 . 50 0 . 40 0 . 45 0 . 50 0 . 40 0 . 45 0 . 50 β β β Compute physical observables as gradient of tensor network contraction

  31. Differentiable spin glass solver optimal couplings tensor network energy & fields contraction optimal [ ] ∂ energy optimal configuration = [ ] ∂ field Liu, LW, Zhang, 2008.06888 https://github.com/TensorBFS/TropicalTensors.jl

  32. Differentiable iPEPS optimization before… now, w/ differentiable programming Liao, Liu, LW, Xiang, PRX ‘19 grad = + + + + + + + 10 − 2 energy relative error + + + + 10 − 3 + + + + simple update full update 10 − 4 Corboz [34] + + + + Vanderstraeten [35] present work 10 − 5 2 3 4 5 6 7 + + + + D , Best variational energy to date Vanderstraeten et al, PRB ‘16 https://github.com/wangleiphy/tensorgrad 1 GPU (Nvidia P100) week

  33. Differentiable iPEPS optimization Infinite size Finite size Tensor network Neural network 10 − 2 energy relative error 10 − 3 10x10 cluster simple update full update 10 − 4 Corboz [34] Vanderstraeten [35] present work Carleo & Troyer, Science ‘17 10 − 5 2 3 4 5 6 7 D Liao, Liu, LW, Xiang, PRX ‘19 Further progress for challenging physical problems: Chen et al, ‘19 Xie et al, ’20 frustrated magnets, fermions, thermodynamics … Tang et al ’20 …

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend