Variational Network Inference: Strong and Stable with Concrete - - PowerPoint PPT Presentation

variational network inference strong and stable with
SMART_READER_LITE
LIVE PREVIEW

Variational Network Inference: Strong and Stable with Concrete - - PowerPoint PPT Presentation

Variational Network Inference: Strong and Stable with Concrete Support Amir Dezfouli, Edwin V. Bonilla and Richard Nock Network Structure Discovery: A Flexible Approach N nodes, T observations: = { y i , t i } 1 Goal: Learn network


slide-1
SLIDE 1

Variational Network Inference: 
 Strong and Stable with Concrete Support

Amir Dezfouli, Edwin V. Bonilla and Richard Nock

slide-2
SLIDE 2

Network Structure Discovery: A Flexible Approach

1

𝒠 = {yi, ti} N nodes, T observations: Goal: Learn network structure

Existence, directionality and strengths

1 2 3 4 5 W12 W32 W43 W54 W15

fi(t) = zi(t) +

N

j=1,j≠i

AijWij [fj(t) + ξjt] yi(t) = fi(t) + ϵit

ϵit ∼ Normal(0,σ2

y)

ξjt ∼ Normal(0,σ2

f )

Model

zi(t) ∼ GP(0,κ(t, t′; θ)) Network-independent trend p(A, W) = ∏

ij

p(Aij)p(Wij) p(Aij) = Bern(ρ) p(Wij) = Normal(0,σ2

w)

Network parameters Aij ∈ {0,1}, Wij ∈ ℝ

slide-3
SLIDE 3

Inference Goal: Estimate

Complications:

f defined cyclically GPs notoriously unscalable
 O(N3T3) Complicated marginal likelihood 
 (f depends on A, W)

2

p(A, W|𝒠)

Trick 1: Derive “inverse” model

f(t) = (I − A ⊙ W)−1(z(t) + A ⊙ Wξt)

MTL with product covariance (Bonilla et al, 2008; Rakitsch et al, 2013) Nodes are “tasks” Sum of two Kronecker products Covariances determined by A, W More efficient computation O(N3 + T3)

Trick 3: Relate to 
 Multi-task learning (MTL)

Trick 2: Marginalise f analytically

p(y|A, W) = Normal(0, Σy) Σy = Kf ⊗ Kt+Kσ ⊗ I

How to deal with complex dependency on A, W?

Modern variational inference

slide-4
SLIDE 4

Modern Variational Inference

Expectations using Monte Carlo

Re-parameterization trick Cannot be applied to discrete rv

3

  • 2
2 4

posterior

ELBO

log p(y) p(A, W) p(y|A, W) qold(A, W) qnew(A, W)

ℒelbo = ℒkl + ℒell ℒkl = − KL(q(A, W)||p(A, W)) ℒell = 𝔽q(A,W) log p(y|A, W)

are variational parameters Aka Gumble-Softmax trick, can sample and evaluate log q(Aij) It helps us get stability for free

Trick 4: Concrete Distribution:

q(Aij) = Concrete(αij, λc)

αij

slide-5
SLIDE 5

Theory: Numerical Stability

Usually imposes the non-singularity of

Sometimes with additional constraints (boundedness of coordinates, eigenvalues)

4

I − A ⊙ W For any is non-singular with probability 1. λc ≥ 0,αij ≥ 0(i ≠ j), I − A ⊙ W Theorem 1: “We get stability for free” λc ≥ 0,αij ≥ 0(i ≠ j), σ2

y ≥ 0, |ℒell| ≪ ∞

For any Theorem 2:

slide-6
SLIDE 6

Theory: Model Stability

Bounds the signal’s log likelihood as a function of external parameters

5

If and , then under a condition

  • n the network signal, it holds with large probability:

, where and condition appears in various forms in previous work.

Theorem 3: Statistical “robustness”

Wij ∼ Normal(μij, σ2

ij)

Aij ∼ Bern(ρij) −log p(y|W, A) ∈ [g(λ∘, y), g(λ∙, y)] , ∀y λ∘ = λ↓(Kt)/2 + σ2

y, λ∙ = 2λ↑(Kt) + σ2 f + σ2 y

g(z, y) = θ(log z + z∥y∥2

2),

Important practical consequences

slide-7
SLIDE 7

Experiments and Conclusions

Sydney property prices, brain fMRI, yeast genome Bayesian approach for network structure discovery

Efficient inference

stability for “free”, robustness and easy estimation

6

Pittwater Manly Mosman Woollahra Hunters Hill