Machine learning for lattice theories Real-world lattices 2 - - PowerPoint PPT Presentation

machine learning for lattice theories
SMART_READER_LITE
LIVE PREVIEW

Machine learning for lattice theories Real-world lattices 2 - - PowerPoint PPT Presentation

Machine learning for lattice theories 1 Michael S. Albergo, Gurtej Kanwar , Phiala E. Shanahan Center for Theoretical Physics, MIT Deep Learning and Physics 1 Albergo, GK, Shanahan [PRD 100 (2019) 034515] Kyoto, Japan (November 1, 2019) Machine


slide-1
SLIDE 1

Machine learning for lattice theories1

Michael S. Albergo, Gurtej Kanwar, Phiala E. Shanahan

1 Albergo, GK, Shanahan [PRD 100 (2019) 034515]

Deep Learning and Physics Kyoto, Japan (November 1, 2019) Center for Theoretical Physics, MIT

slide-2
SLIDE 2

2

Real-world lattices

Machine learning for lattice theories

slide-3
SLIDE 3

3

Real-world lattices Quantum field theories

Machine learning for lattice theories

slide-4
SLIDE 4

Lattices in the real world

  • Many materials have degrees of freedom pinned to a lattice structure

4

[Ella Maru Studio] [Mazurenko et al. 1612.08436]

slide-5
SLIDE 5

Lattices in the real world

  • Thermodynamics describes collective behavior of many degrees of freedom
  • At some temperature T, Boltzmann distribution over microstates

5

with

slide-6
SLIDE 6

Lattices in the real world

  • Thermodynamics describes collective behavior of many degrees of freedom
  • At some temperature T, Boltzmann distribution over microstates

6

[ "Ising Model and Metropolis Algorithm", MathWorks Physics Team ]

with

Ising model has spin s = {↑,↓} per site, with energy penalty for neighboring spins differing. Typical microstates have patches of the same spin at some scale.

slide-7
SLIDE 7

Lattices in the real world

  • Derive thermodynamic observables by averaging microstates

7

Partition function Boltzmann distribution total energy Helmholtz free energy total energy correlation function

. . . . . .

slide-8
SLIDE 8

Lattices for quantum field theories

  • Quantum-mechanical properties also computed as statistical expectation

values via Path Integral

8

similar to partition function

slide-9
SLIDE 9

Lattice Quantum Chromodynamics

  • Predictions relevant to interpret upcoming

high-energy expts

○ Electron-Ion Collider will investigate detailed nuclear structure ○ Deep Underground Neutrino Expt requires nuclear cross sections with neutrinos

  • Pen-and-paper methods fail, numerical evaluation of path integral req'd

9

bnl.gov/eic dunescience.org [D. Leinweber, Visual QCD Archive]

So far! Hong-Ye's talk for holography ideas

slide-10
SLIDE 10

Computational approach to lattice theories

  • Partition functions and path integrals are typically intractable analytically
  • Numerical approximation by Monte Carlo sampling
  • Markov Chain Monte Carlo converges to samples from p(𝜚)

10

sample integral according to estimate observables

. . .

approximately distributed ~ p(𝜚)

slide-11
SLIDE 11

11

lattice theories

Real-world lattices Quantum field theories

Machine learning for lattice theories

slide-12
SLIDE 12

12

lattice theories

Real-world lattices Quantum field theories

numerical methods

– Thermodynamics – Collective phenomena – Spectrum – ...

Machine learning for lattice theories

slide-13
SLIDE 13

13

lattice theories

Real-world lattices Quantum field theories

numerical methods

– Thermodynamics – Collective phenomena – Spectrum – ...

Machine learning for lattice theories

hard to reach continuum limit / critical point in some theories

slide-14
SLIDE 14

Machine learning for lattice theories

14

lattice theories

Real-world lattices Quantum field theories

numerical methods

– Thermodynamics – Collective phenomena – Spectrum – ...

+ ML

hard to reach continuum limit / critical point in some theories

slide-15
SLIDE 15

15

lattice theories

Real-world lattices Quantum field theories

numerical methods

– Thermodynamics – Collective phenomena – Spectrum – ...

+ ML

Machine learning for lattice theories

Sampling using ML 2 Critical slowing down 1 Toy model results 3

slide-16
SLIDE 16

Difficulties with Markov Chain Monte Carlo

  • Need to wait for "burn-in period"
  • Configurations close to each other on the chain will be correlated, so must

take many steps before drawing independent samples

  • Burn-in and correlations both related to Markov chain "autocorrelation time"

→ smaller autocorrelation time means less computational cost!

16

. . .

burn-in ~ p(𝜚) ~ p(𝜚) correlated typically quantify with integrated autocorrelation time:

slide-17
SLIDE 17

Critical slowing down

  • As params defining distribution approach

criticality, for Markov chains using local updates, autocorrelation time diverges

  • Fitting 𝜐int to power law behavior gives

dynamical critical exponents

  • Smaller dynamical critical exponent =

cheaper, closer approach to criticality

17

continuum limit

slide-18
SLIDE 18

Critical slowing down

  • As params defining distribution approach

criticality, for Markov chains using local updates, autocorrelation time diverges

  • Fitting 𝜐int to power law behavior gives

dynamical critical exponents

  • Smaller dynamical critical exponent =

cheaper, closer approach to criticality

18

continuum limit CSD also affects more realistic, complex models: ○ CPN-1 ○ O(N) ○ QCD ○ ...

[ALPHA collaboration 1009.5228] [Frick, et al. PRL 63, 2613] [Flynn, et al. 1504.06292]

CSD in scalar theory used in this work:

slide-19
SLIDE 19

19

lattice theories

Real-world lattices Quantum field theories

numerical methods

– Thermodynamics – Collective phenomena – Spectrum – ...

+ ML

Machine learning for lattice theories

Sampling using ML 2 Critical slowing down 1 Toy model results 3

slide-20
SLIDE 20

Sampling lattice configs

20

likely likely unlikely

(log prob = 22) (log prob = 5) (log prob = -6107)

likely

(log prob = 25)

slide-21
SLIDE 21

Sampling lattice configs ≅ generating images

21

likely likely unlikely likely likely unlikely

[Karras, Lane, Aila / NVIDIA 1812.04948]

slide-22
SLIDE 22

Unique features of the lattice sampling problem

Probability density computable (up to normalization)

Many symmetries in physics

○ Lattice symmetries like translation, rotation, and reflection ○ Per-site symmetries like negation

High-dimensional (109 to 1012) samples

Few (~1000) samples available ahead of time (fewer than # vars!)

○ Hard to use training paradigms that rely on existing samples from distribution

22

slide-23
SLIDE 23

Image generation via ML

1. Likelihood free methods:

E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples ✘ No associated likelihood for each produced sample

2. Autoencoding:

E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability ✘ Same issues as GANs

3. Normalizing flows:

Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself

23

[Goodfellow et al. 1406.2661] [Kingma & Welling 1312.6114] [Rezende & Mohamed 1505.05770] [Shen & Liu 1612.05363]

slide-24
SLIDE 24

Image generation via ML

1. Likelihood free methods:

E.g. Generative Adversarial Networks (GANs) ✘ Needs many real samples ✘ No associated likelihood for each produced sample

2. Autoencoding:

E.g. Variational Auto-Encoders (VAEs) ✔ Good for human interpretability ✘ Same issues as GANs

3. Normalizing flows:

Flow-based models learn a change-of-variables that transforms a known distribution to the desired distribution ✔ Exactly known likelihood for each sample ✔ Can be trained with samples from itself

24

[Goodfellow et al. 1406.2661] [Kingma & Welling 1312.6114] [Rezende & Mohamed 1505.05770] [Shen & Liu 1612.05363]

slide-25
SLIDE 25

Many related approaches

  • Normalizing flows for many-body systems

25

[Noé, Olsson, Köhler, Wu Science 365 (2019)

  • Iss. 6457, 982]

[Liu, Qi, Meng, Fu 1610.03137] [Zhang, E, Wang 1809.10188]

  • Continuous flows
  • Self-Learning Monte Carlo

See talks by Junwei Liu, Lei Wang and Hong-Ye Hu

  • Hamiltonian transforms

[Li, Dong, Zhang, Wang 1910.00024]

slide-26
SLIDE 26

Using a change-of-variables, produce a distribution approximating what you want.

Flow-based generative models

26

[Rezende & Mohamed 1505.05770]

slide-27
SLIDE 27

Using a change-of-variables, produce a distribution approximating what you want.

Flow-based generative models

27

Invertible & Tractable Jacobian Easily sampled Approximates desired dist.

[Rezende & Mohamed 1505.05770]

slide-28
SLIDE 28

We chose real non-volume preserving (real NVP) flows for our work.

Flow-based generative models

28

Easily sampled Approximates desired dist. Many simple layers composed to produce f

[Dinh et al. 1605.08803]

Invertible & Tractable Jacobian

slide-29
SLIDE 29

We chose real non-volume preserving (real NVP) flows for our work.

Flow-based generative models

29

Easily sampled Approximates desired dist.

[Dinh et al. 1605.08803]

Invertible & Tractable Jacobian

slide-30
SLIDE 30

Real NVP coupling layer

1. Freeze 1/2 of the inputs, za 2. Feed frozen vars into neural networks s and t 3. Scale exp(-s) and offset -t applied to unfrozen, zb

  • Simple inverse and Jacobian

30

Application of gi

  • 1
slide-31
SLIDE 31
  • Use known target probability density:
  • For our application, train to minimize shifted KL divergence
  • Can apply self-training: sampling model distribution p̃f(𝜚) to estimate loss

Loss function

31

shift removes unknown normalization Z

slide-32
SLIDE 32

Correcting for model error

  • Known model and target densities, many options to correct for error
  • We use MCMC with proposals from ML model (interoperable with standard

MC updates)

  • Metropolis-Hastings step:

32

ML model proposals Markov Chain

model proposal, independent

  • f previous sample
slide-33
SLIDE 33

Overview of algorithm

33

Parameterize flow using Real NVP coupling layers

Each layer contains arbitrary neural nets s and t

slide-34
SLIDE 34

Overview of algorithm

34

Parameterize flow using Real NVP coupling layers

Each layer contains arbitrary neural nets s and t

Training step

Draw samples from model Compute loss function Gradient descent

Desired accuracy?

Save trained model

slide-35
SLIDE 35

Overview of algorithm

35

Parameterize flow using Real NVP coupling layers

Each layer contains arbitrary neural nets s and t

Training step

Draw samples from model Compute loss function Gradient descent

Markov chain using samples from model

Desired accuracy?

Save trained model

generating samples is "embarrassingly parallel"

slide-36
SLIDE 36

36

lattice theories

Real-world lattices Quantum field theories

numerical methods

– Thermodynamics – Collective phenomena – Spectrum – ...

+ ML

Machine learning for lattice theories

Sampling using ML 2 Critical slowing down 1 Toy model results 3

slide-37
SLIDE 37
  • One real number 𝜚(x) ∊ (-∞,∞) per lattice site x (2D lattice)
  • Action: relativistic scalar with quartic coupling

Toy model: scalar 𝜚4 lattice field theory

37

slide-38
SLIDE 38
  • One real number 𝜚(x) ∊ (-∞,∞) per lattice site x (2D lattice)
  • Action: relativistic scalar with quartic coupling
  • 5 lattice sizes L2 = {62, 82, 102, 122, 142} with bare parameters tuned to follow

a line of constant physics (symmetric phase)

  • HMC and local Metropolis compared against our ML method

Toy model: scalar 𝜚4 lattice field theory

38

slide-39
SLIDE 39

Samples from ML model vs standard algorithms

By eye, ML model produces varied samples and correlations at the right scale

39

slide-40
SLIDE 40

Toy model: scalar 𝜚4 lattice field theory

  • Measured observables:

○ Correlation functions, relating to masses in the (discretized) quantum field theory ○ Response of the vacuum to an impulse (two-point susceptibility) ○ Energy measurement relating to Ising model microstate energy in a particular limit of 𝜚4 theory

40

slide-41
SLIDE 41

Comparing observables (1)

41

Ising energy and two-point susceptibility agree.

slide-42
SLIDE 42

Comparing observables (2)

42

correlation falls off with separation in both directions on periodic lattice

Correlation functions and pole masses agree.

slide-43
SLIDE 43

No critical slowing down in ML approach

43

HMC Local ML models by spending time training up-front, autocorrelations are fixed during sampling

slide-44
SLIDE 44

Moving towards QCD

  • Developing the method to apply to lattice QCD, tackling difficulties with

scaling volume, # dimensions, and to gauge theories

44

David Murphy Dan Hackett Denis Boyda

slide-45
SLIDE 45

Convolutional architectures

  • Work in progress using convolutional networks for s and t, natural due to

locality of physical distributions

  • Conv + checkerboard mask = translational invariance (up to parity)
  • Convolutions also make scaling physical volume easy

45

Transfer trained net + 10 mins retraining 3.5 days training

14 x 14 20 x 20

slide-46
SLIDE 46

Hierarchical models

  • Multiple flows, refining lattice between each
  • Intuition: build correlations at multiple scales efficiently
  • Computational gains allow scaling ML method

46

flow flow flow flow hier flow for 32x32 𝜚4

[Dinh, Sohl-Dickstein, Bengio 1605.08803] [Li & Wang PRL 121 (2018) 260601]

slide-47
SLIDE 47

Towards 3D and 4D lattices

  • No theoretical obstacle to moving from two dimensions to 3D and 4D
  • Work in progress to implement 4D convolutions efficiently in PyTorch

47

8x8x8 model easily trained to 30% acceptance

slide-48
SLIDE 48

Towards gauge theories

  • Real NVP only directly works on fields taking real values 𝜚(x) ∊ (-∞,∞)
  • Gauge theories (and O(N), CPN-1, ...) have compact domains
  • Early success applying stereographic projection + Real NVP in collaboration

with DeepMind team

  • Applications to discrete models (Ising, Potts, etc.)?

○ Some recent ideas emerging [Ziegler & Rush 1901.10548]

48

[Gemici, Rezende, Mohammed 1611.02304]

slide-49
SLIDE 49

Towards gauge theories

  • Real NVP only directly works on fields taking real values 𝜚(x) ∊ (-∞,∞)
  • Gauge theories (and O(N), CPN-1, ...) have compact domains
  • Early success applying stereographic projection + Real NVP in collaboration

with DeepMind team

  • Applications to discrete models (Ising, Potts, etc.)?

○ Some recent ideas emerging [Ziegler & Rush 1901.10548]

49

[Gemici, Rezende, Mohammed 1611.02304]

Thanks! Questions?

slide-50
SLIDE 50

Backup slides

50

slide-51
SLIDE 51
  • Prior distribution chosen to be uncorrelated Gaussian,

i.e. for each site x,

  • Real NVP model:

○ 8-12 Real NVP coupling layers ○ Alternating checkerboard pattern for variable split ○ 2-6 fully connected layers with 100-1024 hidden units

  • Trained using shifted KL loss with Adam optimizer

○ Models trained until 50% and 70% acceptance rate in ML MCMC

ML method for scalar lattice field theory

51

. . .

slide-52
SLIDE 52

Autocorrelation time

  • Well-behaved Markov chains have a "mixing time" determining how many

updates required to burn in / decorrelate samples

○ Hard to compute directly except for very special chains ○ Dominated by the slowest mixing mode

  • Practically useful alternative: integrated autocorrelation time for an observable

52

two-point correlation separated by tau Markov chain steps