Approximated Newton Algorithm for the Ising Model Inference Speeds - - PowerPoint PPT Presentation

approximated newton algorithm for the ising model
SMART_READER_LITE
LIVE PREVIEW

Approximated Newton Algorithm for the Ising Model Inference Speeds - - PowerPoint PPT Presentation

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids


slide-1
SLIDE 1

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting

Ulisse Ferrari

Institut de la Vision, Sorbonne Universités, UPMC

New Frontiers in Non-equilibrium Physics 2015

slide-2
SLIDE 2

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting

Outlook of the seminar

1 Introduction with an application of

pairwise Ising Model to Neuroscience

2 Maximal Entropy model and

the Vanilla (Standard) Learning Algorithm

3 Approximate Newton Method 4 The Long-Time Limit: Stochastic Dynamics 5 Properties of the Stationary Distribution 6 Conclusions and Perspectives

slide-3
SLIDE 3

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Model Inference:

Finding the probability distribution reproducing the data system statistics.

slide-4
SLIDE 4

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Model Inference:

Finding the probability distribution reproducing the data system statistics. Useful for characterizing the behavior of systems of many, strongly correlated, units: neurons, proteins, virus, species distribution, bird flocks but...

slide-5
SLIDE 5

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Model Inference:

Finding the probability distribution reproducing the data system statistics. Useful for characterizing the behavior of systems of many, strongly correlated, units: neurons, proteins, virus, species distribution, bird flocks but... which distribution?

slide-6
SLIDE 6

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Model Inference:

Finding the probability distribution reproducing the data system statistics. Useful for characterizing the behavior of systems of many, strongly correlated, units: neurons, proteins, virus, species distribution, bird flocks but... which distribution?

Maximum Entropy (MaxEnt) Inference:

Search for the largest entropy distribution satisfying a set of constraints.

slide-7
SLIDE 7

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Example: pairwise Ising Model

Given binary units data-set of B configurations of N units:

  • {σi(b)}N

i=1

B

b=1

Find the MaxEnt model reproducing single and pairwise correlations: σiMODEL = σiDATA ≡ 1

B

  • b σi(b)

σiσjMODEL = σiσjDATA ≡ 1

B

  • b σi(b)σj(b)
slide-8
SLIDE 8

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Example: pairwise Ising Model

Given binary units data-set of B configurations of N units:

  • {σi(b)}N

i=1

B

b=1

Find the MaxEnt model reproducing single and pairwise correlations: σiMODEL = σiDATA ≡ 1

B

  • b σi(b)

σiσjMODEL = σiσjDATA ≡ 1

B

  • b σi(b)σj(b)

Finely tune the parameters {h, J} of the pairwise Ising model: Ph,j(σ) = exp

i hiσi + ij Jijσiσj

  • /Z[h, J]
slide-9
SLIDE 9

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

In vivo Pre-Frontal Cortex Recording:

slide-10
SLIDE 10

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

In vivo Pre-Frontal Cortex Recording:

97 experimental sessions of:

Peyrache et al. Nat. Neurosci. (2009)

slide-11
SLIDE 11

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Ising Model Inference

σi(b) = 1 if neuron i spiked during time-bin b Ask to reproduce neurons firing rates and correlations.

Schneidman et al. Nature 2006; Cocco, Monasson,PRL (2011)

slide-12
SLIDE 12

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Ising Model Inference

⇒ ⇒ ⇒ σi(b) = 1 if neuron i spiked during time-bin b Ask to reproduce neurons firing rates and correlations.

Schneidman et al. Nature 2006; Cocco, Monasson,PRL (2011)

slide-13
SLIDE 13

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Ising Model Inference

⇒ ⇒ ⇒ σi(b) = 1 if neuron i spiked during time-bin b Ask to reproduce neurons firing rates and correlations. 97 × 3 couplings network sets (97 × {PRE, TASK , POST})

Schneidman et al. Nature 2006; Cocco, Monasson,PRL (2011)

slide-14
SLIDE 14

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Learning related coupling Adjustement

A =

  • i,j:JTASK,JPOST0

sign

  • JTASK

ij

− JPRE

ij

  • ·
  • JPOST

ij

− JPRE

ij

slide-15
SLIDE 15

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Learning related coupling Adjustement

A =

  • i,j:JTASK,JPOST0

sign

  • JTASK

ij

− JPRE

ij

  • ·
  • JPOST

ij

− JPRE

ij

slide-16
SLIDE 16

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Introduction

Learning related coupling Adjustement

A =

  • i,j:JTASK,JPOST0

sign

  • JTASK

ij

− JPRE

ij

  • ·
  • JPOST

ij

− JPRE

ij

slide-17
SLIDE 17

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

1

Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

2

Approximated Newton Method

3

The Long-Time Limit: Stochastic Dynamics

4

Properties of the Stationary Distribution

slide-18
SLIDE 18

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

General MaxEnt

Given a list of D observables to reproduce {Σa(σ)}D

a=1

(generic functions of the system units) Find the MaxEnt model parameters {Xa}D

a=1

PX(σ) = exp

a XaΣa(σ)

  • /Z[X]

reproducing the observables averages: ΣaDATA ≡ Pa = Qa[X] ≡ ΣaX

slide-19
SLIDE 19

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

Equivalent to log-likelihood maximization: X∗ = arg maxX

  • logL[ X ]
  • ≡ arg maxX
  • X · P − log Z[X]
slide-20
SLIDE 20

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

Equivalent to log-likelihood maximization: X∗ = arg maxX

  • logL[ X ]
  • ≡ arg maxX
  • X · P − log Z[X]
  • in fact:

∇alogL[ X ] =

d dXa

  • X · P − log Z[X]
  • = Pa − Qa[X]
slide-21
SLIDE 21

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

Equivalent to log-likelihood maximization: X∗ = arg maxX

  • logL[ X ]
  • ≡ arg maxX
  • X · P − log Z[X]
  • in fact:

∇alogL[ X ] =

d dXa

  • X · P − log Z[X]
  • = Pa − Qa[X]

Cannot be solved analytically. Ackley, Hinton and Sejnowski (Vanilla Gradient): Xt+1 = Xt + δXVG

t

; δXVG

t

= α(P − Q[Xt])

slide-22
SLIDE 22

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

Equivalent to log-likelihood maximization: X∗ = arg maxX

  • logL[ X ]
  • ≡ arg maxX
  • X · P − log Z[X]
  • in fact:

∇alogL[ X ] =

d dXa

  • X · P − log Z[X]
  • = Pa − Qa[X]

Cannot be solved analytically. Ackley, Hinton and Sejnowski (Vanilla Gradient): Xt+1 = Xt + δXVG

t

; δXVG

t

= α(P − Q[Xt]) If 0 < Pa < 1 for all a = 1, . . . D, the problem is well posed: X∗ exists and is unique and the dynamics converges (for infinitesimally small α)

slide-23
SLIDE 23

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

slide-24
SLIDE 24

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

slide-25
SLIDE 25

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

slide-26
SLIDE 26

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

slide-27
SLIDE 27

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

Vanilla Gradient: δuVG

t

∼ (1 − α a)−t ⇒ α < 2/a; δvVG

t

∼ (1 − α b)−t ⇒ α < 2/b

slide-28
SLIDE 28

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

Vanilla Gradient: δuVG

t

∼ (1 − α a)−t ⇒ α < 2/a; δvVG

t

∼ (1 − α b)−t ⇒ α < 2/b Newton Method: δuVG

t

∼ (1 − α)−t ⇒ α < 2; δvVG

t

∼ (1 − α)−t ⇒ α < 2

slide-29
SLIDE 29

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

A 2-dimensional example:

logL[u, v] = − a

2(u − u∞)2 − b 2(v − v∞)2

Vanilla Gradient: δuVG

t

∼ (1 − α a)−t ⇒ α < 2/a; δvVG

t

∼ (1 − α b)−t ⇒ α < 2/b Newton Method: δuVG

t

∼ (1 − α)−t ⇒ α < 2; δvVG

t

∼ (1 − α)−t ⇒ α < 2 α = 1 ⇒ convergence in one step!

slide-30
SLIDE 30

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

1

Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

2

Approximated Newton Method

3

The Long-Time Limit: Stochastic Dynamics

4

Properties of the Stationary Distribution

slide-31
SLIDE 31

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

The same happens for the MaxEnt inference: logL[X ≈ X∗] ≈ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b)

χab[X] ≡ −∂2logL[X]

∂Xa∂Xb = ΣaΣbX − ΣaXΣbX

slide-32
SLIDE 32

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

The same happens for the MaxEnt inference: logL[X ≈ X∗] ≈ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b)

χab[X] ≡ −∂2logL[X]

∂Xa∂Xb = ΣaΣbX − ΣaXΣbX

Vanilla Gradient: δXVG

t

= α ∇logL[Xt−1] δXµ

t ≡ a Vµ a δXa,t ∼ (1 − α λµ)−t

slide-33
SLIDE 33

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

The same happens for the MaxEnt inference: logL[X ≈ X∗] ≈ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b)

χab[X] ≡ −∂2logL[X]

∂Xa∂Xb = ΣaΣbX − ΣaXΣbX

Vanilla Gradient: δXVG

t

= α ∇logL[Xt−1] δXµ

t ≡ a Vµ a δXa,t ∼ (1 − α λµ)−t

Newton Method1: δXNM

t

= α χ−1[Xt−1] ∇logL[Xt−1] δXµ

t ≡ a Vµ a δXa,t ∼ (1 − α)−t

1(here equivalent to Amari98 Natural Gradient)

slide-34
SLIDE 34

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

The same happens for the MaxEnt inference: logL[X ≈ X∗] ≈ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b)

χab[X] ≡ −∂2logL[X]

∂Xa∂Xb = ΣaΣbX − ΣaXΣbX

Vanilla Gradient: δXVG

t

= α ∇logL[Xt−1] δXµ

t ≡ a Vµ a δXa,t ∼ (1 − α λµ)−t

Newton Method1: δXNM

t

= α χ−1[Xt−1] ∇logL[Xt−1] δXµ

t ≡ a Vµ a δXa,t ∼ (1 − α)−t

VERY SLOW: expensive estimation & inversion of χ[X]

1(here equivalent to Amari98 Natural Gradient)

slide-35
SLIDE 35

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

However, for the Ising model we can approximate: χab[X∗] ≈ χ ab ≡ ΣaΣbDATA − ΣaDATAΣbDATA

slide-36
SLIDE 36

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

However, for the Ising model we can approximate: χab[X∗] ≈ χ ab ≡ ΣaΣbDATA − ΣaDATAΣbDATA

Approximated Newton (AN) Method:

δXAN

t

= α χ −1 ∇logL[Xt−1]

slide-37
SLIDE 37

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

However, for the Ising model we can approximate: χab[X∗] ≈ χ ab ≡ ΣaΣbDATA − ΣaDATAΣbDATA

Approximated Newton (AN) Method:

δXAN

t

= α χ −1 ∇logL[Xt−1] Remarks on χ[X∗] ≈ χ : equivalent to say that an Ising distribution properly describes data. states that the model Fisher is close to the observables co-variance.

slide-38
SLIDE 38

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

As the algorithm works iteratively, it requires an

early-stop condition

slide-39
SLIDE 39

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

As the algorithm works iteratively, it requires an

early-stop condition

idea: stop the algorithm when Q[X] is statistically compatible with P using the P-covariance χ /B

slide-40
SLIDE 40

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

As the algorithm works iteratively, it requires an

early-stop condition

idea: stop the algorithm when Q[X] is statistically compatible with P using the P-covariance χ /B ǫ

  • P , Q[X]

B 2D

  • ab(Pa − Qa)
  • χ −1

ab (Pb − Qb)

quantifies the distance between Q[X] and P in the χ /B metric.

slide-41
SLIDE 41

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

As the algorithm works iteratively, it requires an

early-stop condition

idea: stop the algorithm when Q[X] is statistically compatible with P using the P-covariance χ /B ǫ

  • P , Q[X]

B 2D

  • ab(Pa − Qa)
  • χ −1

ab (Pb − Qb)

quantifies the distance between Q[X] and P in the χ /B metric. For two i.i.d data-sets: ǫ

  • P , P′

≈ 1 ⇒ we stop the algorithm as soon as ǫ < 1

slide-42
SLIDE 42

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

APPROXIMATED NEWTON ALGORITHM:

1 Initialization:

(a) Chose X0 and compute Q[X0] and ǫ0 = ǫ

  • P , Q[X0]
  • (b) Then set α0 = 1 and M = min( 2B

ǫ0 , B) MCMC samplings

slide-43
SLIDE 43

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

APPROXIMATED NEWTON ALGORITHM:

1 Initialization:

(a) Chose X0 and compute Q[X0] and ǫ0 = ǫ

  • P , Q[X0]
  • (b) Then set α0 = 1 and M = min( 2B

ǫ0 , B) MCMC samplings

2 Iterate the following step:

(a) update the Xt (b) estimate Q[Xt] with M = min( 2B

ǫt−1 , B) MCMC samplings

(c) compute ǫt = ǫ

  • P , Q[Xt]
  • ,

(d1) ǫt < ǫt−1: accept the update and increase α (d2) ǫt > ǫt−1: discard the update, lower α and re-estimate Q[Xt].

slide-44
SLIDE 44

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

APPROXIMATED NEWTON ALGORITHM:

1 Initialization:

(a) Chose X0 and compute Q[X0] and ǫ0 = ǫ

  • P , Q[X0]
  • (b) Then set α0 = 1 and M = min( 2B

ǫ0 , B) MCMC samplings

2 Iterate the following step:

(a) update the Xt (b) estimate Q[Xt] with M = min( 2B

ǫt−1 , B) MCMC samplings

(c) compute ǫt = ǫ

  • P , Q[Xt]
  • ,

(d1) ǫt < ǫt−1: accept the update and increase α (d2) ǫt > ǫt−1: discard the update, lower α and re-estimate Q[Xt].

3 stop the algorithm when ǫt < 1.

slide-45
SLIDE 45

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

Rat retina ganglion cells

Two moving bars. 2.1h of MEA recording B = 4.8 · 105 of ∆t = 16ms N = 95 cells D = 4560 parameters to infer.

slide-46
SLIDE 46

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

Rat retina ganglion cells

Two moving bars. 2.1h of MEA recording B = 4.8 · 105 of ∆t = 16ms N = 95 cells D = 4560 parameters to infer. Convergence time from independent spins model with 8 × 3.4Ghz CPUs: TAN = 144 ± 4s TVG(α = 0.15) = 4.2 · 104s

slide-47
SLIDE 47

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

Rat retina ganglion cells

Two moving bars. 2.1h of MEA recording B = 4.8 · 105 of ∆t = 16ms N = 95 cells D = 4560 parameters to infer. Convergence time from independent spins model with 8 × 3.4Ghz CPUs: TAN = 144 ± 4s TVG(α = 0.15) = 4.2 · 104s cij = σiσj − σiσj

slide-48
SLIDE 48

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Approximated Newton Method

Rat retina ganglion cells

Two moving bars. 2.1h of MEA recording B = 4.8 · 105 of ∆t = 16ms N = 95 cells D = 4560 parameters to infer. Convergence time from independent spins model with 8 × 3.4Ghz CPUs: TAN = 144 ± 4s TVG(α = 0.15) = 4.2 · 104s P(K) = Prob(

i σi = K)

slide-49
SLIDE 49

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

1

Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

2

Approximated Newton Method

3

The Long-Time Limit: Stochastic Dynamics

4

Properties of the Stationary Distribution

slide-50
SLIDE 50

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Q[X] is estimated through M MCMC measurements. Q[X] ⇒ Q[X]MC is random variable!

slide-51
SLIDE 51

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Q[X] is estimated through M MCMC measurements. Q[X] ⇒ Q[X]MC is random variable! ∇logLMC

X

= P − Q[X]MC → 0 only on average,

slide-52
SLIDE 52

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Q[X] is estimated through M MCMC measurements. Q[X] ⇒ Q[X]MC is random variable! ∇logLMC

X

= P − Q[X]MC → 0 only on average,

Change of Framework:

Xt → Pt(X) X , rather than converge to a fixed point, approaches a stationary P∞(X)

slide-53
SLIDE 53

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Q[X] is estimated through M MCMC measurements. Q[X] ⇒ Q[X]MC is random variable! ∇logLMC

X

= P − Q[X]MC → 0 only on average,

Change of Framework:

Xt → Pt(X) X , rather than converge to a fixed point, approaches a stationary P∞(X)

Master Equation:

Pt+1(X′) =

  • DX Pt(X) WX→X′(α)
slide-54
SLIDE 54

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

For M ≫ 1 and X ≈ X∗: logL[X] ≃ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b)

slide-55
SLIDE 55

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

For M ≫ 1 and X ≈ X∗: logL[X] ≃ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b) 1 ∇alogLMC X = b χ[X∗]ab (X∗ b − Xb) ≈ b χ ab(X∗ b − Xb)

slide-56
SLIDE 56

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

For M ≫ 1 and X ≈ X∗: logL[X] ≃ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b) 1 ∇alogLMC X = b χ[X∗]ab (X∗ b − Xb) ≈ b χ ab(X∗ b − Xb) 2

  • ∇alogLMC

X

∇blogLMC

X

  • c = χ[X]ab/M ≃ χ[X∗]ab/M ≈ χ ab/M
slide-57
SLIDE 57

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

For M ≫ 1 and X ≈ X∗: logL[X] ≃ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b) 1 ∇alogLMC X = b χ[X∗]ab (X∗ b − Xb) ≈ b χ ab(X∗ b − Xb) 2

  • ∇alogLMC

X

∇blogLMC

X

  • c = χ[X]ab/M ≃ χ[X∗]ab/M ≈ χ ab/M

a normal approximation gives: P(∇logLMC

X ) ≃ N

  • χ · (X∗ − X) ; χ /M
  • (∇logLMC

X )

slide-58
SLIDE 58

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

For M ≫ 1 and X ≈ X∗: logL[X] ≃ logL[X∗] − 1

2

  • ab(Xa − X∗

a) χ[X∗]ab (Xb − X∗ b) 1 ∇alogLMC X = b χ[X∗]ab (X∗ b − Xb) ≈ b χ ab(X∗ b − Xb) 2

  • ∇alogLMC

X

∇blogLMC

X

  • c = χ[X]ab/M ≃ χ[X∗]ab/M ≈ χ ab/M

a normal approximation gives: P(∇logLMC

X ) ≃ N

  • χ · (X∗ − X) ; χ /M
  • (∇logLMC

X )

WVG

X→X′(α) = Prob

  • ∇logLMC

X

= X′−X

α

  • WAN

X→X′(α) = Prob

  • ∇logLMC

X

= χ · X′−X

α

slide-59
SLIDE 59

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Imposing Pt+1(X) = Pt(X) PVG

∞ (X) = N

  • X∗; α

M

  • 2δ − α χ

−1 (X) PAN

∞ (X) = N

  • X∗;

α M(2−α) χ −1

(X)

slide-60
SLIDE 60

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Imposing Pt+1(X) = Pt(X) PVG

∞ (X) = N

  • X∗; α

M

  • 2δ − α χ

−1 (X), αλµ < 2 PAN

∞ (X) = N

  • X∗;

α M(2−α) χ −1

(X), α < 2

slide-61
SLIDE 61

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Imposing Pt+1(X) = Pt(X) PVG

∞ (X) = N

  • X∗; α

M

  • 2δ − α χ

−1 (X), αλµ < 2 PAN

∞ (X) = N

  • X∗;

α M(2−α) χ −1

(X), α < 2 Which self-consistently defines X ≈ X∗

slide-62
SLIDE 62

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Imposing Pt+1(X) = Pt(X) PVG

∞ (X) = N

  • X∗; α

M

  • 2δ − α χ

−1 (X), αλµ < 2 PAN

∞ (X) = N

  • X∗;

α M(2−α) χ −1

(X), α < 2 Which self-consistently defines X ≈ X∗ From P(∇logLMC

X ) = P(P − Q[X]MC)

PVG

∞ (QMC) = N

  • P; 2

M χ (2δ − α χ )−1

(QMC) PAN

∞ (QMC) = N

  • P;

2 M(2−α) χ

  • (QMC)
slide-63
SLIDE 63

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting The Long-Time Limit: Stochastic Dynamics

Imposing Pt+1(X) = Pt(X) PVG

∞ (X) = N

  • X∗; α

M

  • 2δ − α χ

−1 (X), αλµ < 2 PAN

∞ (X) = N

  • X∗;

α M(2−α) χ −1

(X), α < 2 Which self-consistently defines X ≈ X∗ From P(∇logLMC

X ) = P(P − Q[X]MC)

PVG

∞ (QMC) = N

  • P; 2

M χ (2δ − α χ )−1

(QMC) PAN

∞ (QMC) = N

  • P;

2 M(2−α) χ

  • (QMC)

Which is better? How to set the parameters?

slide-64
SLIDE 64

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

1

Maximal Entropy Models and the Vanilla (standard) Learning Algorithm

2

Approximated Newton Method

3

The Long-Time Limit: Stochastic Dynamics

4

Properties of the Stationary Distribution

slide-65
SLIDE 65

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Algorithm Vs Empirical distributions

An experiment provides empirical estimates of QEMP: PEMP(QEMP) ≃ N

  • PTRUE, χEMP

PTRUE: result from infinitely long experiment χEMP expected co-variance for B measurements

slide-66
SLIDE 66

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Algorithm Vs Empirical distributions

An experiment provides empirical estimates of QEMP: PEMP(QEMP) ≃ N

  • PTRUE, χEMP

An inference algorithm provides numerical estimates of QMC: PALG

P

(QMC) ≃ N

  • P, χALG

PTRUE: result from infinitely long experiment χEMP expected co-variance for B measurements P one-shot sampling of PEMP

slide-67
SLIDE 67

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Algorithm Vs Empirical distributions

An experiment provides empirical estimates of QEMP: PEMP(QEMP) ≃ N

  • PTRUE, χEMP

An inference algorithm provides numerical estimates of QMC: PALG

P

(QMC) ≃ N

  • P, χALG

PTRUE: result from infinitely long experiment χEMP expected co-variance for B measurements P one-shot sampling of PEMP An optimal inference algorithm should provide:

PALG as close as possible to PEMP.

What is the optimal χALG value?

slide-68
SLIDE 68

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Kullback-Leibler distance between PEMP and PALG

P

: DKL

  • PEMP(·)||PALG

P

(·)

slide-69
SLIDE 69

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Kullback-Leibler distance between PEMP and PALG

P

: DKL

  • PEMP(·)||PALG

P

(·)

  • χOPT = arg minχALG
  • DP PEMP(P) DKL
  • PEMP(·)||PALG

P

(·)

slide-70
SLIDE 70

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Kullback-Leibler distance between PEMP and PALG

P

: DKL

  • PEMP(·)||PALG

P

(·)

  • χOPT = arg minχALG
  • DP PEMP(P) DKL
  • PEMP(·)||PALG

P

(·)

  • The solution and its approximation are:

χOPT = 2χEMP ≈ 2 χ /B

slide-71
SLIDE 71

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Kullback-Leibler distance between PEMP and PALG

P

: DKL

  • PEMP(·)||PALG

P

(·)

  • χOPT = arg minχALG
  • DP PEMP(P) DKL
  • PEMP(·)||PALG

P

(·)

  • The solution and its approximation are:

χOPT = 2χEMP ≈ 2 χ /B to compare with: χVG = 2

M χ (2δ − α χ )−1 ,

χAN =

2 M(2−α) χ

slide-72
SLIDE 72

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Kullback-Leibler distance between PEMP and PALG

P

: DKL

  • PEMP(·)||PALG

P

(·)

  • χOPT = arg minχALG
  • DP PEMP(P) DKL
  • PEMP(·)||PALG

P

(·)

  • The solution and its approximation are:

χOPT = 2χEMP ≈ 2 χ /B to compare with: χVG = 2

M χ (2δ − α χ )−1 ,

χAN =

2 M(2−α) χ

AN with M(2 − α) = B reaches the optimum! VG underfits λµ ≫ (2 − B/M)/α and overfits λµ ≪ (2 − B/M)/α

slide-73
SLIDE 73

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Synthetic data: Theory Vs Simulations

Bethe Lattice Ising Model N = 10, c = 4 Jij = ±0.53, hi = −0.14 − 2

j Jij

100 independent estimations

  • f P and χ

through 216 sampling of PEMP Inference with M = B

slide-74
SLIDE 74

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Properties of the Stationary Distribution

Synthetic data: Theory Vs Simulations

Bethe Lattice Ising Model N = 10, c = 4 Jij = ±0.53, hi = −0.14 − 2

j Jij

100 independent estimations

  • f P and χ

through 216 sampling of PEMP Inference with M = B

slide-75
SLIDE 75

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Conclusions and Perspectives

Conclusions: MaxEnt models are useful to describe multi-units systems The AN learning is faster than the VG algorithm. Within the large B approximation is possible to completely characterize the long time behavior The AN with α = 1 and M = B is optimal against overfitting. Perspectives: Improve the gaussian approximations Test the algorithm to non-pairwise models Generalize the class of model distributions beyond MaxEnt Include hidden variables and the RBM framework

slide-76
SLIDE 76

Approximated Newton Algorithm for the Ising Model Inference Speeds Up Convergence, Performs Optimally and Avoids Over-fitting Conclusions and Perspectives

THANKS

Collaborators for P .F . Cortex work: Francesco Battaglia Simona Cocco Remi Monasson Gaia Tavoni Founding EU-FP7 FET OPEN project Enlightenment 284801 Human Brain Project (HBP CLAP)

arXiv:1507.04254