Machine learning for lattice theories1
Michael S. Albergo, Gurtej Kanwar, Phiala E. Shanahan
1 Albergo, GK, Shanahan [PRD 100 (2019) 034515]
Deep Learning and Physics Kyoto, Japan (November 1, 2019) Center for Theoretical Physics, MIT
Machine learning for lattice theories Real-world lattices 2 - - PowerPoint PPT Presentation
Machine learning for lattice theories 1 Michael S. Albergo, Gurtej Kanwar , Phiala E. Shanahan Center for Theoretical Physics, MIT Deep Learning and Physics 1 Albergo, GK, Shanahan [PRD 100 (2019) 034515] Kyoto, Japan (November 1, 2019) Machine
Michael S. Albergo, Gurtej Kanwar, Phiala E. Shanahan
1 Albergo, GK, Shanahan [PRD 100 (2019) 034515]
Deep Learning and Physics Kyoto, Japan (November 1, 2019) Center for Theoretical Physics, MIT
2
Real-world lattices
Machine learning for lattice theories
3
Real-world lattices Quantum field theories
Machine learning for lattice theories
Lattices in the real world
4
[Ella Maru Studio] [Mazurenko et al. 1612.08436]
Lattices in the real world
5
with
Lattices in the real world
6
[ "Ising Model and Metropolis Algorithm", MathWorks Physics Team ]
with
Ising model has spin s = {↑,↓} per site, with energy penalty for neighboring spins differing. Typical microstates have patches of the same spin at some scale.
Lattices in the real world
7
Partition function Boltzmann distribution total energy Helmholtz free energy total energy correlation function
. . . . . .
Lattices for quantum field theories
values via Path Integral
8
similar to partition function
Lattice Quantum Chromodynamics
high-energy expts
○ Electron-Ion Collider will investigate detailed nuclear structure ○ Deep Underground Neutrino Expt requires nuclear cross sections with neutrinos
9
bnl.gov/eic dunescience.org [D. Leinweber, Visual QCD Archive]
So far! Hong-Ye's talk for holography ideas
Computational approach to lattice theories
10
sample integral according to estimate observables
. . .
approximately distributed ~ p(𝜚)
11
lattice theories
Real-world lattices Quantum field theories
Machine learning for lattice theories
12
lattice theories
Real-world lattices Quantum field theories
numerical methods
– Thermodynamics – Collective phenomena – Spectrum – ...
Machine learning for lattice theories
13
lattice theories
Real-world lattices Quantum field theories
numerical methods
– Thermodynamics – Collective phenomena – Spectrum – ...
Machine learning for lattice theories
hard to reach continuum limit / critical point in some theories
Machine learning for lattice theories
14
lattice theories
Real-world lattices Quantum field theories
numerical methods
– Thermodynamics – Collective phenomena – Spectrum – ...
+ ML
hard to reach continuum limit / critical point in some theories
15
lattice theories
Real-world lattices Quantum field theories
numerical methods
– Thermodynamics – Collective phenomena – Spectrum – ...
+ ML
Machine learning for lattice theories
Sampling using ML 2 Critical slowing down 1 Toy model results 3
Difficulties with Markov Chain Monte Carlo
take many steps before drawing independent samples
→ smaller autocorrelation time means less computational cost!
16
. . .
burn-in ~ p(𝜚) ~ p(𝜚) correlated typically quantify with integrated autocorrelation time:
Critical slowing down
criticality, for Markov chains using local updates, autocorrelation time diverges
dynamical critical exponents
cheaper, closer approach to criticality
17
continuum limit
Critical slowing down
criticality, for Markov chains using local updates, autocorrelation time diverges
dynamical critical exponents
cheaper, closer approach to criticality
18
continuum limit CSD also affects more realistic, complex models: ○ CPN-1 ○ O(N) ○ QCD ○ ...
[ALPHA collaboration 1009.5228] [Frick, et al. PRL 63, 2613] [Flynn, et al. 1504.06292]
CSD in scalar theory used in this work:
19
lattice theories
Real-world lattices Quantum field theories
numerical methods
– Thermodynamics – Collective phenomena – Spectrum – ...
+ ML
Machine learning for lattice theories
Sampling using ML 2 Critical slowing down 1 Toy model results 3
Sampling lattice configs
20
likely likely unlikely
(log prob = 22) (log prob = 5) (log prob = -6107)
likely
(log prob = 25)
Sampling lattice configs ≅ generating images
21
likely likely unlikely likely likely unlikely
[Karras, Lane, Aila / NVIDIA 1812.04948]
Unique features of the lattice sampling problem
✓
Probability density computable (up to normalization)
✓
Many symmetries in physics
○ Lattice symmetries like translation, rotation, and reflection ○ Per-site symmetries like negation
✘
High-dimensional (109 to 1012) samples
✘
Few (~1000) samples available ahead of time (fewer than # vars!)
○ Hard to use training paradigms that rely on existing samples from distribution
22
Image generation via ML
1. Likelihood free methods:
E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples ✘ No associated likelihood for each produced sample
2. Autoencoding:
E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability ✘ Same issues as GANs
3. Normalizing flows:
Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself
23
[Goodfellow et al. 1406.2661] [Kingma & Welling 1312.6114] [Rezende & Mohamed 1505.05770] [Shen & Liu 1612.05363]
Image generation via ML
1. Likelihood free methods:
E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples ✘ No associated likelihood for each produced sample
2. Autoencoding:
E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability ✘ Same issues as GANs
3. Normalizing flows:
Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself
24
[Goodfellow et al. 1406.2661] [Kingma & Welling 1312.6114] [Rezende & Mohamed 1505.05770] [Shen & Liu 1612.05363]
Many related approaches
25
[Noé, Olsson, Köhler, Wu Science 365 (2019)
[Liu, Qi, Meng, Fu 1610.03137] [Zhang, E, Wang 1809.10188]
See talks by Junwei Liu, Lei Wang and Hong-Ye Hu
[Li, Dong, Zhang, Wang 1910.00024]
Using a change-of-variables, produce a distribution approximating what you want.
Flow-based generative models
26
[Rezende & Mohamed 1505.05770]
Using a change-of-variables, produce a distribution approximating what you want.
Flow-based generative models
27
Invertible & Tractable Jacobian Easily sampled Approximates desired dist.
[Rezende & Mohamed 1505.05770]
We chose real non-volume preserving (real NVP) flows for our work.
Flow-based generative models
28
Easily sampled Approximates desired dist. Many simple layers composed to produce f
[Dinh et al. 1605.08803]
Invertible & Tractable Jacobian
We chose real non-volume preserving (real NVP) flows for our work.
Flow-based generative models
29
Easily sampled Approximates desired dist.
[Dinh et al. 1605.08803]
Invertible & Tractable Jacobian
Real NVP coupling layer
1. Freeze 1/2 of the inputs, za 2. Feed frozen vars into neural networks s and t 3. Scale exp(-s) and offset -t applied to unfrozen, zb
30
Application of gi
Loss function
31
shift removes unknown normalization Z
Correcting for model error
MC updates)
32
ML model proposals Markov Chain
✘
model proposal, independent
Overview of algorithm
33
Parameterize flow using Real NVP coupling layers
Each layer contains arbitrary neural nets s and t
Overview of algorithm
34
Parameterize flow using Real NVP coupling layers
Each layer contains arbitrary neural nets s and t
Training step
Draw samples from model Compute loss function Gradient descent
Desired accuracy?
Save trained model
Overview of algorithm
35
Parameterize flow using Real NVP coupling layers
Each layer contains arbitrary neural nets s and t
Training step
Draw samples from model Compute loss function Gradient descent
Markov chain using samples from model
Desired accuracy?
Save trained model
generating samples is "embarrassingly parallel"
36
lattice theories
Real-world lattices Quantum field theories
numerical methods
– Thermodynamics – Collective phenomena – Spectrum – ...
+ ML
Machine learning for lattice theories
Sampling using ML 2 Critical slowing down 1 Toy model results 3
Toy model: scalar 𝜚4 lattice field theory
37
a line of constant physics (symmetric phase)
Toy model: scalar 𝜚4 lattice field theory
38
Samples from ML model vs standard algorithms
By eye, ML model produces varied samples and correlations at the right scale
39
Toy model: scalar 𝜚4 lattice field theory
○ Correlation functions, relating to masses in the (discretized) quantum field theory ○ Response of the vacuum to an impulse (two-point susceptibility) ○ Energy measurement relating to Ising model microstate energy in a particular limit of 𝜚4 theory
40
Comparing observables (1)
41
Ising energy and two-point susceptibility agree.
Comparing observables (2)
42
correlation falls off with separation in both directions on periodic lattice
Correlation functions and pole masses agree.
No critical slowing down in ML approach
43
HMC Local ML models by spending time training up-front, autocorrelations are fixed during sampling
Moving towards QCD
scaling volume, # dimensions, and to gauge theories
44
David Murphy Dan Hackett Denis Boyda
Convolutional architectures
locality of physical distributions
45
Transfer trained net + 10 mins retraining 3.5 days training
14 x 14 20 x 20
Hierarchical models
46
flow flow flow flow hier flow for 32x32 𝜚4
[Dinh, Sohl-Dickstein, Bengio 1605.08803] [Li & Wang PRL 121 (2018) 260601]
Towards 3D and 4D lattices
47
8x8x8 model easily trained to 30% acceptance
Towards gauge theories
with DeepMind team
○ Some recent ideas emerging [Ziegler & Rush 1901.10548]
48
[Gemici, Rezende, Mohammed 1611.02304]
Towards gauge theories
with DeepMind team
○ Some recent ideas emerging [Ziegler & Rush 1901.10548]
49
[Gemici, Rezende, Mohammed 1611.02304]
50
i.e. for each site x,
○ 8-12 Real NVP coupling layers ○ Alternating checkerboard pattern for variable split ○ 2-6 fully connected layers with 100-1024 hidden units
○ Models trained until 50% and 70% acceptance rate in ML MCMC
ML method for scalar lattice field theory
51
. . .
Autocorrelation time
updates required to burn in / decorrelate samples
○ Hard to compute directly except for very special chains ○ Dominated by the slowest mixing mode
52
two-point correlation separated by tau Markov chain steps