Prospects of Lattice Field Theory Simulations powered by Deep Neural - - PowerPoint PPT Presentation
Prospects of Lattice Field Theory Simulations powered by Deep Neural - - PowerPoint PPT Presentation
Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks Julian Urban ITP Heidelberg 2019/11/06 " this will " this is never work " revolutionary " Prospects of Lattice Field Theory Simulations
Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks
Julian Urban ITP Heidelberg 2019/11/06
" this will never work " " this is revolutionary "
Prospects of Lattice Field Theory Simulations powered by Deep Neural Networks
Julian Urban ITP Heidelberg 2019/11/06
" this will never work " " this is revolutionary "
Overview
- Stochastic estimation of Euclidean path integrals
- Overrelaxation with Generative Adversarial Networks (GAN)*
- Ergodic sampling with Invertible Neural Networks (INN)†
- Some results for real, scalar φ4-theory in d = 2
* Urban, Pawlowski (2018) — “Reducing Autocorrelation Times in Lattice Simulations with Generative Adversarial Networks” — arXiv: 1811.03533
† Albergo, Kanwar, Shanahan (2019) — “Flow-based generative models for Markov chain Monte Carlo in lattice
field theory” — arXiv: 1904.12072 1 / 20
Markov Chain Monte Carlo
O(φ)φ ∼ e−S(φ) =
- Dφ e−S(φ) O(φ)
- Dφ e−S(φ)
∼ = 1 N
N
- i=1
O(φi)
Φ Φ'
- accept φ′ with probability:
TA(φ′|φ) = min
- 1, e−∆S
- autocorrelation function:
CO(t) = OiOi+t − OiOi+t
2 / 20
Real, Scalar φ4-Theory on the Lattice
- φ(x) ∈ R discretized on d-cubic Euclidean lattice with volume
V = Ld and periodic boundary conditions S =
- x
- −2κ
d
- µ=1
φ(x)φ(x + ˆ µ) + (1 − 2λ) φ(x)2 + λ φ(x)4
- magnetization M = 1
V
- x
φ(x)
- connected susceptibility χ2 = V
- M2 − M2
- connected two-point correlation function
G(x, y) = φ(x)φ(y) − φ(x)φ(y)
3 / 20
Real, Scalar φ4-Theory on the Lattice
d = 2
<|M|>
"phase_diagram" u 1:2:3 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
κ
0.005 0.01 0.015 0.02 0.025 0.03
λ
5 10 15 20 25
4 / 20
Real, Scalar φ4-Theory on the Lattice
d = 2, V = 82, λ = 0.02
1 2 3 4 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
<|M|> κ
5 10 0.1 0.2 0.3 0.4
χ2
5 / 20
Independent (Black-Box) Sampling
Replace p(φ) by an approximate distribution q(φ) generated from a function g : RV − → RV , χ − → φ , where the components of χ are i.i.d. random variables (commonly N(0, 1)). Theoretical / computational requirements:
- ergodic in p(φ)
- p(φ) = 0 ⇒ q(φ) = 0
- sufficient overlap between q and p for
practical use on human timescales
- balanced and asymptotically exact
- statistical selection or weighting procedure for asymptotically
unbiased estimation similar to accept/reject correction
6 / 20
Overrelaxation
Φ Φ', S(Φ') = S(Φ) nMC
TA(φ′|φ) = 1 for ∆S = 0
- sampling on hypersurfaces
- f constant S
- ergodicity through normal
MC steps
- requirements
- ability to reproduce all possible S
- symmetric a priori selection probability
7 / 20
Generative Adversarial Networks
fake samples real samples Generator Discriminator random numbers loss
- overrelaxation step: find χ s.t. S[g(χ)] = S[φ]
- iterative gradient descent solution of
χ′ = arg min
χ
S[g(χ)] − S[φ]
8 / 20
Sample Examples
d = 2, V = 322, κ = 0.21, λ = 0.022
"real_sample.txt" matrix
- 2
- 1
1 2
"good_sample.txt" matrix
- 2
- 1
1 2
"good_sample2.txt" matrix
- 2
- 1
1 2
"good_sample3.txt" matrix
- 2
- 1
1 2
9 / 20
Magnetization & Action Distributions
2000 4000 6000 8000
- 0.2
- 0.1
0.1 0.2 M HMC GAN 2000 4000 6000 8000
- 0.2
- 0.1
0.1 0.2 M HMC HMC + GAN 4000 8000 12000 16000 20000 300 400 500 600 700 800 900 1000 S HMC GAN 2000 4000 6000 8000 10000 400 450 500 550 600 S HMC HMC + GAN
10 / 20
Reduced Autocorrelations
0.0005 0.001 0.0015 0.002 0.0025 1 2 3 4 5 6 7 8 9 10
CM(t)
t local HMC nH = 1 nH = 2 nH = 3
11 / 20
Problems with this Approach
- GAN
- relies on the existence of an exhaustive dataset
- no direct access to sample probability
- adversarial learning complicates quantitative error assessment
- convergence/stability issues such as mode collapse
- Overrelaxation
- still relies on traditional MC algorithms
- symmetry of the selection probability
- little effect on autocorrelations of observables coupled to S
- latent space search is computationally rather demanding
12 / 20
Proper Reweighting to Model Distribution
Oφ ∼ p(φ) =
- Dφ p(φ) O(φ)
=
- Dφ q(φ) p(φ)
q(φ) O(φ) = p(φ) q(φ) O(φ)
- φ ∼ q(φ)
Generate q(φ) through parametrizable, invertible function g(χ|ω) with tractable Jacobian determinant: q(φ) = r(χ(φ))
- det ∂g−1(φ)
∂φ
- Optimal choice for q(φ)
← → Minimal relative entropy / Kullback-Leibler divergence DKL(q p) = −
- Dφ q(φ) log p(φ)
q(φ) = −
- log p(φ)
q(φ)
- φ ∼ q(φ)
13 / 20
INN / Real NVP Flow
Ardizzone, Klessen, K¨
- the, Kruse, Maier-Hein, Pellegrini, Rahner, Rother, Wirkert (2018) — “Analyzing Inverse
Problems with Invertible Neural Networks” — arXiv: 1808.04730 Ardizzone, K¨
- the, Kruse, L¨
uth, Rother, Wirkert (2019) — “Guided Image Generation with Conditional Invertible Neural Networks” — arXiv: 1907.02392 14 / 20
Advantages of this Approach
- learning is completely data-independent
- improved error metrics
- Metropolis-Hastings acceptance rate
- convergence properties of DKL
- ergodicity & balance + asymptotic exactness satisfied a priori
- no latent space deformation required
Objective: maximization of overlap between q(φ) and p(φ).
15 / 20
Comparison with HMC Results
d = 2, V = 82, λ = 0.02 INN: 8 layers, 4 hidden layers, 512 neurons / layer
1 2 3 4 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
<|M|> κ
HMC bare weighted Metropolis 5 10 0.1 0.2 0.3 0.4
χ2
16 / 20
Comparison with HMC Results
κ = 0.2
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 1 2 3 4 5 6 7 8
G(s) s
HMC bare weighted Metropolis
17 / 20
Potential Applications & Future Work
- accelerated simulations of physically interesting theories
(QCD, Yukawa, Gauge-Higgs, Condensed Matter)
- additional conditioning (cINN) to encode arbitrary
couplings κ, λ
- tackling sign problems with generalized thimble / path
- ptimization approaches by latent space disentanglement
- efficient minimization of DKL i.t.o. the ground state energy of
an interacting hybrid classical-quantum system
18 / 20
Challenges & Problems
- scalability to higher dimensions / larger volumes / more d.o.f.
(e.g. QCD: ∼ 109 floats per configuration)
- multi-GPU parallelization
- progressive growing to successively larger volumes
- architectures that intrinsically respect symmetries and
topological properties of the theory
- gauge symmetry / equivariance
- critical slowing down
19 / 20
Thank you!
20 / 20