Self-learning Monte Carlo method and all optical neural network - - PowerPoint PPT Presentation

self learning monte carlo method and
SMART_READER_LITE
LIVE PREVIEW

Self-learning Monte Carlo method and all optical neural network - - PowerPoint PPT Presentation

Self-learning Monte Carlo method and all optical neural network Junwei Liu ( ) Department of Physics Hong Kong University of Science and Technology Deep learning and Physics, YITP, Kyoto, Japan, 2019 The general motivation As


slide-1
SLIDE 1

Junwei Liu (劉軍偉) Department of Physics Hong Kong University of Science and Technology

Self-learning Monte Carlo method and all optical neural network

Deep learning and Physics, YITP, Kyoto, Japan, 2019

slide-2
SLIDE 2

The general motivation

  • As well known, there are great developments in machine

learning techniques such as principle component analysis, deep neural networks, convolutional neural networks, generative neural networks, reinforcement learning and so on.

  • By using these methods, there are also many great

achievements such as image recognition, auto-pilot cars and Alpha-GO.

  • Can we use these methods in physics to solve some problems?

If so, what kind of problems can we solve and how?

2/46

slide-3
SLIDE 3

Part 1: Self-learning Monte Carlo (ML Physics)

Monte Carlo simulation

3/46

slide-4
SLIDE 4

Collaborators and References

1. Self-Learning Monte Carlo Method, PRB 95, 041101(R) (2017) 2. Self-Learning Monte Carlo Method in Fermion Systems, PRB 95, 241104(R) (2017) 3. Self-Learning Quantum Monte Carlo Method in Interacting Fermion Systems, PRB 96, 041119(R) (2017) 4. Self-learning Monte Carlo method: Continuous-time algorithm, PRB 96, 161102 (2017) 5. Self-learning Monte Carlo with Deep Neural Networks, PRB 97, 205140 (2018) Liang Fu (MIT)

References Related work from Prof Lei Wang’s group at IOP

1. Accelerated Monte Carlo simulations with restricted Boltzmann machines, PRB 95, 035105 (2017) 2. Recommender engine for continuous-time quantum Monte Carlo methods, PRE 95, 031301(R) (2017) Yang Qi (Fudan) Ziyang Meng (IOP) Huitao Shen (MIT) Xiaoyan Xu (UCSD) Yuki Nagai (JAEA)

4/46

slide-5
SLIDE 5

Monte Carlo methods

  • Consider a statistical mechanics problem

𝑎 = ෍

𝐷

𝑓−𝛾𝐼[𝐷] = ෍

𝐷

𝑋(𝐷) 𝑃 = ൗ ෍

𝐷

𝑃 𝐷 𝑓−𝛾𝐼 𝐷 𝑎 = ൗ ෍

𝐷

𝑃 𝐷 𝑋(𝐷) 𝑎

  • Pick up 𝑂 configurations (samples) in the configuration space

𝐷 based on the importance 𝑋 𝐷𝑗 /𝑎. Then we can estimate

  • bservables 𝑃 as

𝑃 ≈ ൘ ෍

𝑗=1 𝑂

𝑃 𝐷𝑗 𝑂

  • The statistical error is proportional to Τ

1 𝑂. In high dimension (d>8), Monte Carlo is the most important and sometimes the

  • nly available method to perform the integral/summation.

5/46

slide-6
SLIDE 6

Quantum Monte Carlo methods

  • Consider a quantum system characterized by 𝐼

𝑎 = ෍

𝜔

𝜔 𝑓−𝛾𝐼 𝜔 Method 1: Trotter decomposition 𝑎 = ෍

𝜔

𝜔 𝑓−𝛾𝐼 𝜔 =σ𝜔𝜔1𝜔2…𝜔𝑁 𝜔1 𝑓−𝛾𝐼/𝑁 𝜔2 𝜔2 𝑓−𝛾𝐼/𝑁 𝜔3 … 𝜔𝑁 𝑓−𝛾𝐼/𝑁 𝜔1 Method 2: serial expansion 𝑎 = σ𝜔 𝜔 𝑓−𝛾𝐼 𝜔 =σ𝜔 𝜔 σ𝑜 −𝛾𝐼 𝑜/𝑜! 𝜔

  • Map the N-dimensional quantum model to be a (N+1)-

dimensional “classical” model, and then use Monte Carlo method to simulate this “classical” model.

6/46

slide-7
SLIDE 7

Markov chain Monte Carlo (MCMC)

  • MCMC is a way to do important sampling based on the

distribution 𝑋(𝐷). Configurations are generated one by one by the following approach:

  • 1. first propose the next trial configuration 𝐷𝑢
  • 2. If 𝑆𝑏𝑜𝑒() ≤ 𝑼(𝑫𝒋 → 𝑫𝒖), then the next configuration 𝐷𝑗+1 = 𝐷𝑢

Otherwise, 𝐷𝑗+1 = 𝐷𝑗

  • 3. Repeating step 1 and step 2

⋯ → 𝐷𝑗−1 → 𝐷𝑗 → 𝐷𝑗+1 → ⋯

  • It is clearly that the next step only depends on the last step

and transition matrix 𝑈, and we can demonstrate that the detailed balance principle guarantees that the Markov process will converge to the desired distribution 𝑈(𝐷 → 𝐸) 𝑈(𝐸 → 𝐷) = 𝑋 𝐸 𝑋 𝐷

  • N. Metropolis, J. Chem. Phys. 21, 1087 (1953)

7/46

slide-8
SLIDE 8

Metropolis-Hastings algorithm

  • The transition probability can be further written as

𝑈 𝐷 → 𝐸 = 𝑇 𝐷 → 𝐸 𝑞 𝐷 → 𝐸 𝑇 𝐷 → 𝐸 :proposal probability of conf. 𝐷 from conf. 𝐸 𝑞 𝐷 → 𝐸 :acceptance probability of configuration 𝐷

  • In Metropolis Algorithm, transition matrix is chosen as

𝑈 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑋 𝐷

  • N. Metropolis, et. al J. Chem. Phys. 21, 1087 (1953)
  • In Metropolis-Hastings Algorithm, the acceptance ratio is

𝑞 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑇 𝐸 → 𝐷 𝑋 𝐷 𝑇 𝐷 → 𝐸

  • W. H. Hastings, Biometrika 57, 97 (1970)

8/46

slide-9
SLIDE 9

Independent Samples

  • As well known, for the

statistic measurements,

  • nly the independent

samples matter.

  • However, since the configurations are generated sequentially
  • ne by one as random walk in configuration space, it is

inevitable that the configurations in the Markov chain are correlated with each other and not independent.

  • The configurations generated by different Monte Carlo

methods have different autocorrelations.

9/46

slide-10
SLIDE 10

Different update algorithms

  • Too small step length: small difference, high acceptance.
  • Ideal step length: big difference and high acceptance,

exploring the low-energy configurations.

  • Too large step length: big difference, low acceptance.

𝒇−𝜸(𝑭 𝑬 −𝑭(𝑫))

10/46

slide-11
SLIDE 11

How to justify different Monte Carlo methods?

Time consumption 𝑢𝑚𝑑 to get two statistically independent configurations

  • Auto-correlation time 𝑚𝑑: the number of steps to get

independent configurations  Bigger differences  independent, but low acceptance ratio  Similar energies  high acceptance ratio, but not independent

  • Time consumption 𝑢 to get one configuration (mainly for the

calculation of weight)

Self-learning Monte Carlo methods are designed to improve both parts, thus can speed up the calculations dramatically.

11/46

slide-12
SLIDE 12

Local update

  • Local Update 𝑇 𝐷 → 𝐸 = 𝑇 𝐸 → 𝐷 =

1 𝑂

  • N. Metropolis, et al J. Chem. Phys. 21, 1087 (1953)
  • Acceptance ratio

𝛽 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑇 𝐸 → 𝐷 𝑋 𝐷 𝑇 𝐷 → 𝐸 = min 1, 𝑓−𝛾(𝐹 𝐸 −𝐹(𝐷))

  • Very general: applies to any model

12/46

slide-13
SLIDE 13

Critical slowing down

  • Dynamical relaxation time diverges at the critical point:

convergence is very slow in the critical system.

  • For 2D Ising model, autocorrelation time 𝜐 ∝ 𝑀𝑨, 𝑨 = 2.125

13/46

slide-14
SLIDE 14

How to get high acceptance ratio?

𝛽 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑇 𝐸 → 𝐷 𝑋 𝐷 𝑇 𝐷 → 𝐸 𝑋 𝐸 𝑋 𝐷 = 𝑇 𝐷 → 𝐸 𝑇 𝐸 → 𝐷

14/46

slide-15
SLIDE 15

Global update -- Wolff algorithm in Ising model

  • 1. Randomly choose one site 𝑗

Swendsen and Wang, Phys. Rev. Lett. 58, 86 (1987)

  • U. Wolff, Phys. Rev. Lett. 62, 361 (1989)
  • 2. If the adjacent sites have the same status, then add them in

the cluster 𝐷 with the probability 𝑇 𝑗 → 𝑘 = 1 − 𝑓−2𝛾𝐾

  • 3. Repeat step 2 for all the sites in the cluster 𝐷
  • 4. Change the status of all the sites in the cluster 𝐷

15/46

slide-16
SLIDE 16

Reduce critical slowing down

Swendsen and Wang, Phys. Rev. Lett. 58, 86 (1987)

16/46

slide-17
SLIDE 17

Local update and global update

Local update

  • Locally update configuration

by changing one site per MC step

  • Very general
  • Inefficient around the phase

transition points (critical slowing down) Global update

  • Globally update configuration

by simultaneously changing many sites per MC step

  • High efficiency
  • Designed for specific

models and hard to be generalized to other models

17/46

slide-18
SLIDE 18

How to get high acceptance ratio?

𝛽 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑇 𝐸 → 𝐷 𝑋 𝐷 𝑇 𝐷 → 𝐸 𝑋 𝐸 𝑋 𝐷 = 𝑇 𝐷 → 𝐸 𝑇 𝐸 → 𝐷

Very Hard

𝑋 𝐸 𝑋 𝐷 ≈ 𝑇 𝐷 → 𝐸 𝑇 𝐸 → 𝐷

18/46

slide-19
SLIDE 19

My initial naïve idea

  • Use machine learning to learn some common features for

these “important” configurations, and then generate new configurations based on these learned features.

  • It seems to be good, but it does not work, because we don’t

know

𝑇 𝐷→𝐸 𝑇 𝐸→𝐷 , and we cannot calculate the right acceptance

probability.

  • However, this idea tells us that there are other important

information in the generated configurations, besides 𝑃 ≈ Τ σ𝑗=1

𝑂

𝑃 𝐷𝑗 𝑂 based on them.

19/46

slide-20
SLIDE 20

Hidden information in generated configurations

  • The generated configurations have a similar distribution close

to the original distribution !!!

  • It is too obvious and seems to be too trivial, but we do not use

this seriously besides calculating the average of operators.

  • Can we further use it and how?
  • The answer is YES and we can do it in self-learning Monte

Carlo method.

20/46

slide-21
SLIDE 21

The right way to use the hidden information in generated configurations

Critical properties Universality class Effective Model

Guide Learn

Yang Qi (Fudan)

21/46

slide-22
SLIDE 22

Core ideas of self-learning Monte Carlo

  • Learn an approximated simpler model

 Having efficient global update methods  Evaluating the weights faster

  • Use the simpler model to guide us to

simulate the original hard model

  • J. Liu, et al PRB 95,

041101(R) (2017)

First learn Then earn How to train? How to propose?

22/46

slide-23
SLIDE 23

SLMC in Boson system

  • Original Hamiltonian has both two-body and four-body

interactions, and we do not have global update method.

23/46

slide-24
SLIDE 24

Fit the parameters in 𝑰eff

  • Generate configurations with local update at T=5>Tc, away

from the critical points

  • Perform linear regression to fit the parameters in 𝐼eff
  • Effective Hamiltonian
  • Generate configurations with reinforced learning at Tc

24/46

slide-25
SLIDE 25

Build the cluster based on 𝑰eff

  • Self-learning update: cluster is constructed using Wolff update
  • beying detailed balance principle of effective model 𝑰eff

𝑇 𝐸 → 𝐷 𝑇 𝐷 → 𝐸 = 𝑋eff 𝐷 𝑋eff(𝐸)

  • Acceptance ratio:

𝛽 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑇 𝐸 → 𝐷 𝑋 𝐷 𝑇 𝐷 → 𝐸 = min 1, 𝑋 𝐸 𝑋eff 𝐷 𝑋 𝐷 𝑋eff(𝐸) = min 1, 𝑓

−𝛾 𝐹 𝐸 −𝐹eff 𝐸 − 𝐹 𝐷 −𝐹eff 𝐷

25/46

slide-26
SLIDE 26

For different sizes 𝑴 × 𝑴

  • J. Liu, et al PRB 95, 041101(R) (2017)
  • SLMC is 20~40 fold faster than the local update.
  • We can easily get the result for 320 × 320.

26/46

slide-27
SLIDE 27

How to justify different Monte Carlo methods?

Time consumption 𝑢𝑚𝑑 to get two statistically independent configurations

  • Auto-correlation time 𝑚𝑑: the number of steps to get

independent configurations  Bigger differences  independent, but low acceptance ratio  Similar energies  high acceptance ratio, but not independent

  • Time consumption 𝑢 to get one configuration (mainly for the

calculation of weight)

Self-learning Monte Carlo methods are designed to improve either or both parts depending on models, thus can speed up the calculations dramatically.

27/46

slide-28
SLIDE 28

Weight (Free energy) in Fermion systems

  • In general fermion systems, the partition function can be

calculated based on the following form

  • We have to deal with heavy matrix operations to calculate the

weight (free energy), which is very time-consuming 𝑃(𝑂3).

  • However, the weight only depends on the configuration of

bosonic field, which can be described by pure Boson models.

28/46

slide-29
SLIDE 29
  • Complexity to obtain two uncorrelated configurations 𝑂𝜐 × 𝑂3 = 𝜐𝑂4

Complexity of SLMC

  • Complexity to obtain two uncorrelated configurations 𝑂𝜐 + 𝑂3 /𝑞

29/46

slide-30
SLIDE 30

Example: double exchange model

  • Weight
  • The 𝐹𝑜 are obtained by ED of 𝐼 for given configurations 𝜚
  • Computational complexity in conventional method: 𝑃(𝜐𝑀𝑒𝑀3𝑒)
  • Free Fermions are coupled to classical Heisenberg spins
  • Magnetic order in metal

30/46

slide-31
SLIDE 31

Learning the effective model

  • Effective model
  • Reproduce the RKKY feature
  • Reproduce the right Boltzmann distribution
  • J. Liu, et al PRB 95, 241104(R) (2017)

31/46

slide-32
SLIDE 32

No critical slowing down

  • Autocorrelation time is much shorter in SLMC than conventional

method

  • There is no critical slowing down in SLMC
  • Computational complexity in conventional method: 𝑃(𝜐𝑀𝑒𝑀3𝑒)
  • Computational complexity in SLMC: 𝑃 𝑚𝑑 ≈ 𝜐𝑀𝑒 + 𝑃(𝑀3𝑒)
  • SLMC reduces the complexity by 𝑃(𝜐𝑀𝑒), and can be more than

1000 times faster

  • J. Liu, et al PRB 95, 241104(R) (2017)

32/46

slide-33
SLIDE 33

Example II: SLMC in DQMC (Interacting Fermion)

  • In conventional DQMC, the computational

complexity in conventional method: 𝑃(𝜐𝛾𝑂3)

  • In SLMC with cumulative update, the

complexity is 𝑃(𝛿𝛾𝑂𝑚𝑑 + 𝑂3 + 𝛾𝑂2)

  • a. Cumulative update 𝑃(𝛿𝛾𝑂𝑚𝑑)
  • b. Detail balance 𝑃(𝑂3)
  • c. Sweep Green’s function 𝑃(𝛾𝑂2)

Ziyang Meng (IOP) Xiaoyan Xu (UCSD)

  • X. Y. Xu, et al PRB 96, 041119(R) (2017)

33/46

slide-34
SLIDE 34

Results for a particular model

For the first time, we can achieve large system size as 100×100 in two dimension

  • X. Y. Xu, et al PRB 96, 041119(R) (2017)

34/46

slide-35
SLIDE 35

Example III: SLMC in CT-AUX

For the single impurity Anderson model in low temperature or large interaction,

  • ne can achieve speedup

𝑃 𝑜 𝑛𝑑𝜐𝑇𝑀 ≈ 𝑃 𝛾𝑉

Yuki Nagai (JAEA)

Yuki Nagai, et al PRB 96, 161102 (2017) Single impurity Anderson model

35/46

slide-36
SLIDE 36

SLMC with Deep Neural Networks

Neural Networks Neural Networks

  • H. Shen, J. Liu, L. Fu, PRB 97, 205140 (2018)

36/46

slide-37
SLIDE 37

Asymmetric Anderson model with a single impurity

  • By Hubbard-Stratonovich transformation, we can decouple the

interaction parts as an auxiliary bosonic field coupled to free fermions

  • Then we can use Monte Carlo to sample the auxiliary field

configuration space, thus simulating the original asymmetric Anderson model.

  • It is not easy to write an explicit effective model to describe the

bosonic action here, but we use neural networks to do it.

𝑒𝜏 and Ƹ 𝑑𝑙 are the fermion annihilation operator for the impurity and for the conduction electrons, and ො 𝑜𝑒,𝜏 ≡ መ 𝑒𝜏

+ መ

𝑒𝜏, ො 𝑜𝑒 = ො 𝑜𝑒↑ + ො 𝑜𝑒↓

37/46

slide-38
SLIDE 38

Fully connected neural network

38/46

slide-39
SLIDE 39

Insights from fully connected NN

  • Translation symmetry
  • Much less parameters
  • Easy to extend
  • H. Shen, J. Liu, L. Fu, PRB 97, 205140 (2018)

39/46

slide-40
SLIDE 40

Results of convolutional network

Huitao Shen (MIT)

  • H. Shen, J. Liu, L. Fu,

PRB 97, 205140 (2018)

  • DNNs can generally

work for different chemical potential and temperature.

40/46

slide-41
SLIDE 41

Part 2: All optical neural network (Physics ML)

  • At the speed of light
  • Intrinsic infinite parallel calculations
  • Robust against local errors

Shengwang Du (HKUST)

41/46

slide-42
SLIDE 42

Spatial light modulator: linear transformation

  • Use grating to control the

relation between the input light and the output light  linear transformation 𝐽𝑝𝑣𝑢 = 𝑋𝐽𝑗𝑜

42/46

slide-43
SLIDE 43

Test the linear transformation

Camera1 Coupling SLM1 L1 L2 FM L3 SLM2 Camera2 L4 M 43/46

slide-44
SLIDE 44

Non-linear function in optics

Electromagnetically induced transparency

44/46

slide-45
SLIDE 45

Test the fully connected neural network

  • We can reproduce well the results from the fully connected

neural networks (16, 4, 2) in computer

Ying Zuo, et al. Optica 6, 1132 (2019)

45/46

slide-46
SLIDE 46

Summary

1. SLMC can be generally applied in both bosonic and fermionic systems to speed up MC simulations. 2. SLMC is a bridge between the numerical simulation and analytic studies for many-body physics:

  • Better understanding from analytic studies can give us better

effective model and better efficiency.

  • The learned effective Model in SLMC is a good starting point,

and can give us more insights for analytic studies. 3. SLMC also offers us a framework to integrate machine learning techniques into Monte Carlo simulations. 4. We have already realized the AONN in experiments.

46/46