Prospects of Lattice Field Theory Simulations powered by Deep Neural - - PowerPoint PPT Presentation

prospects of lattice field theory simulations powered by
SMART_READER_LITE
LIVE PREVIEW

Prospects of Lattice Field Theory Simulations powered by Deep Neural - - PowerPoint PPT Presentation

Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks Julian Urban ITP Heidelberg 2019/11/06 " this will " this is never work " revolutionary " Prospects of Lattice Field Theory Simulations


slide-1
SLIDE 1

Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks

Julian Urban ITP Heidelberg 2019/11/06

" this will never work " " this is revolutionary "

slide-2
SLIDE 2

Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks

Julian Urban ITP Heidelberg 2019/11/06

" this will never work " " this is revolutionary "

slide-3
SLIDE 3

Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks

Julian Urban ITP Heidelberg 2019/11/06

" this will never work " " this is revolutionary "

slide-4
SLIDE 4

Overview

  • Stochastic estimation of Euclidean path integrals
  • Overrelaxation with Generative Adversarial Networks (GAN)*
  • Ergodic sampling with Invertible Neural Networks (INN)†
  • Some results for real, scalar φ4-theory in d = 2

* Urban, Pawlowski (2018) — “Reducing Autocorrelation Times in Lattice Simulations with Generative Adversarial Networks” — arXiv: 1811.03533

† Albergo, Kanwar, Shanahan (2019) — “Flow-based generative models for Markov chain Monte Carlo in lattice

field theory” — arXiv: 1904.12072 1 / 20

slide-5
SLIDE 5

Markov Chain Monte Carlo

O(φ)φ ∼ e−S(φ) =

  • Dφ e−S(φ) O(φ)
  • Dφ e−S(φ)

∼ = 1 N

N

  • i=1

O(φi)

Φ Φ'

  • accept φ′ with probability:

TA(φ′|φ) = min

  • 1, e−∆S
  • autocorrelation function:

CO(t) = OiOi+t − OiOi+t

2 / 20

slide-6
SLIDE 6

Real, Scalar φ4-Theory on the Lattice

  • φ(x) ∈ R discretized on d-cubic Euclidean lattice with volume

V = Ld and periodic boundary conditions S =

  • x
  • −2κ

d

  • µ=1

φ(x)φ(x + ˆ µ) + (1 − 2λ) φ(x)2 + λ φ(x)4

  • magnetization M = 1

V

  • x

φ(x)

  • connected susceptibility χ2 = V
  • M2 − M2
  • connected two-point correlation function

G(x, y) = φ(x)φ(y) − φ(x)φ(y)

3 / 20

slide-7
SLIDE 7

Real, Scalar φ4-Theory on the Lattice

d = 2

<|M|>

"phase_diagram" u 1:2:3 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

κ

0.005 0.01 0.015 0.02 0.025 0.03

λ

5 10 15 20 25

4 / 20

slide-8
SLIDE 8

Real, Scalar φ4-Theory on the Lattice

d = 2, V = 82, λ = 0.02

1 2 3 4 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

<|M|> κ

5 10 0.1 0.2 0.3 0.4

χ2

5 / 20

slide-9
SLIDE 9

Independent (Black-Box) Sampling

Replace p(φ) by an approximate distribution q(φ) generated from a function g : RV − → RV , χ − → φ , where the components of χ are i.i.d. random variables (commonly N(0, 1)). Theoretical / computational requirements:

  • ergodic in p(φ)
  • p(φ) = 0 ⇒ q(φ) = 0
  • sufficient overlap between q and p for

practical use on human timescales

  • balanced and asymptotically exact
  • statistical selection or weighting procedure for asymptotically

unbiased estimation similar to accept/reject correction

6 / 20

slide-10
SLIDE 10

Overrelaxation

Φ Φ', S(Φ') = S(Φ) nMC

TA(φ′|φ) = 1 for ∆S = 0

  • sampling on hypersurfaces
  • f constant S
  • ergodicity through normal

MC steps

  • requirements
  • ability to reproduce all possible S
  • symmetric a priori selection probability

7 / 20

slide-11
SLIDE 11

Generative Adversarial Networks

fake samples real samples Generator Discriminator random numbers loss

  • overrelaxation step: find χ s.t. S[g(χ)] = S[φ]
  • iterative gradient descent solution of

χ′ = arg min

χ

S[g(χ)] − S[φ]

8 / 20

slide-12
SLIDE 12

Sample Examples

d = 2, V = 322, κ = 0.21, λ = 0.022

"real_sample.txt" matrix

  • 2
  • 1

1 2

"good_sample.txt" matrix

  • 2
  • 1

1 2

"good_sample2.txt" matrix

  • 2
  • 1

1 2

"good_sample3.txt" matrix

  • 2
  • 1

1 2

9 / 20

slide-13
SLIDE 13

Magnetization & Action Distributions

2000 4000 6000 8000

  • 0.2
  • 0.1

0.1 0.2 M HMC GAN 2000 4000 6000 8000

  • 0.2
  • 0.1

0.1 0.2 M HMC HMC + GAN 4000 8000 12000 16000 20000 300 400 500 600 700 800 900 1000 S HMC GAN 2000 4000 6000 8000 10000 400 450 500 550 600 S HMC HMC + GAN

10 / 20

slide-14
SLIDE 14

Reduced Autocorrelations

0.0005 0.001 0.0015 0.002 0.0025 1 2 3 4 5 6 7 8 9 10

CM(t)

t local HMC nH = 1 nH = 2 nH = 3

11 / 20

slide-15
SLIDE 15

Problems with this Approach

  • GAN
  • relies on the existence of an exhaustive dataset
  • no direct access to sample probability
  • adversarial learning complicates quantitative error assessment
  • convergence/stability issues such as mode collapse
  • Overrelaxation
  • still relies on traditional MC algorithms
  • symmetry of the selection probability
  • little effect on autocorrelations of observables coupled to S
  • latent space search is computationally rather demanding

12 / 20

slide-16
SLIDE 16

Proper Reweighting to Model Distribution

Oφ ∼ p(φ) =

  • Dφ p(φ) O(φ)

=

  • Dφ q(φ) p(φ)

q(φ) O(φ) = p(φ) q(φ) O(φ)

  • φ ∼ q(φ)

Generate q(φ) through parametrizable, invertible function g(χ|ω) with tractable Jacobian determinant: q(φ) = r(χ(φ))

  • det ∂g−1(φ)

∂φ

  • Optimal choice for q(φ)

← → Minimal relative entropy / Kullback-Leibler divergence DKL(q p) = −

  • Dφ q(φ) log p(φ)

q(φ) = −

  • log p(φ)

q(φ)

  • φ ∼ q(φ)

13 / 20

slide-17
SLIDE 17

INN / Real NVP Flow

Ardizzone, Klessen, K¨

  • the, Kruse, Maier-Hein, Pellegrini, Rahner, Rother, Wirkert (2018) — “Analyzing Inverse

Problems with Invertible Neural Networks” — arXiv: 1808.04730 Ardizzone, K¨

  • the, Kruse, L¨

uth, Rother, Wirkert (2019) — “Guided Image Generation with Conditional Invertible Neural Networks” — arXiv: 1907.02392 14 / 20

slide-18
SLIDE 18

Advantages of this Approach

  • learning is completely data-independent
  • improved error metrics
  • Metropolis-Hastings acceptance rate
  • convergence properties of DKL
  • ergodicity & balance + asymptotic exactness satisfied a priori
  • no latent space deformation required

Objective: maximization of overlap between q(φ) and p(φ).

15 / 20

slide-19
SLIDE 19

Comparison with HMC Results

d = 2, V = 82, λ = 0.02 INN: 8 layers, 4 hidden layers, 512 neurons / layer

1 2 3 4 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

<|M|> κ

HMC bare weighted Metropolis 5 10 0.1 0.2 0.3 0.4

χ2

16 / 20

slide-20
SLIDE 20

Comparison with HMC Results

κ = 0.2

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 1 2 3 4 5 6 7 8

G(s) s

HMC bare weighted Metropolis

17 / 20

slide-21
SLIDE 21

Potential Applications & Future Work

  • accelerated simulations of physically interesting theories

(QCD, Yukawa, Gauge-Higgs, Condensed Matter)

  • additional conditioning (cINN) to encode arbitrary

couplings κ, λ

  • tackling sign problems with generalized thimble / path
  • ptimization approaches by latent space disentanglement
  • efficient minimization of DKL i.t.o. the ground state energy of

an interacting hybrid classical-quantum system

18 / 20

slide-22
SLIDE 22

Challenges & Problems

  • scalability to higher dimensions / larger volumes / more d.o.f.

(e.g. QCD: ∼ 109 floats per configuration)

  • multi-GPU parallelization
  • progressive growing to successively larger volumes
  • architectures that intrinsically respect symmetries and

topological properties of the theory

  • gauge symmetry / equivariance
  • critical slowing down

19 / 20

slide-23
SLIDE 23

Thank you!

20 / 20