Einsum Networks Fast and Scalable Learning ofTractable Probabilistic - - PowerPoint PPT Presentation

einsum networks
SMART_READER_LITE
LIVE PREVIEW

Einsum Networks Fast and Scalable Learning ofTractable Probabilistic - - PowerPoint PPT Presentation

Einsum Networks Fast and Scalable Learning ofTractable Probabilistic Circuits Robert Peharz Steven Lang Antonio Vergari Eindhoven University of Technology Technical University of Darmstadt University of California, Los Angeles Karl Stelzner


slide-1
SLIDE 1

Einsum Networks

Fast and Scalable Learning ofTractable Probabilistic Circuits

Robert Peharz

Eindhoven University of Technology

Steven Lang

Technical University of Darmstadt

Antonio Vergari

University of California, Los Angeles

Karl Stelzner

Technical University of Darmstadt

Alejandro Molina

Technical University of Darmstadt

Martin Trapp

Graz University of Technology

Guy Van den Broeck

University of California, Los Angeles

Kristian Kersting

Technical University of Darmstadt

Zoubin Ghahramani

University of Cambridge; Uber AI Labs

International Conference on Machine Learning (ICML), July 2020

slide-2
SLIDE 2

In This Paper

Probabilistic Circuits (PCs)

— Just a special type of neural network

Yet, they are slow

Computational graphs highly sparse and cluttered Operations implemented in the log-domain

∼ 50 times slower than neural net of comparable size

We propose Einsum Networks (EiNets) PC architecture using a few monolithic einsum operations Run and train PCs up to two orders of magnitude faster Scale PCs to datasets previously out of reach (CelebA, SVHN)

2/21

slide-3
SLIDE 3

Probabilistic Circuits

slide-4
SLIDE 4

Probabilistic Circuits

Computational graph containing 3 types of operations: Distributions (leaves), products, and weighted sums.

4/21

slide-5
SLIDE 5

Probabilistic Circuits

Computational graph containing 3 types of operations: Distributions (leaves), products, and weighted sums.

4/21

slide-6
SLIDE 6

Probabilistic Circuits

Computational graph containing 3 types of operations: Distributions (leaves), products, and weighted sums.

4/21

slide-7
SLIDE 7

Probabilistic Circuits

Computational graph containing 3 types of operations: Distributions (leaves), products, and weighted sums.

4/21

slide-8
SLIDE 8

Probabilistic Circuits — Leaf Distributions

5/21

slide-9
SLIDE 9

Probabilistic Circuits — Leaf Distributions

Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X. Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, …

6/21

slide-10
SLIDE 10

Probabilistic Circuits — Leaf Distributions

Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X. Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, …

6/21

x

slide-11
SLIDE 11

Probabilistic Circuits — Leaf Distributions

Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X. Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, …

6/21

x p(x)

slide-12
SLIDE 12

Probabilistic Circuits — Leaf Distributions

Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X. Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, …

6/21

x p(x) θ

slide-13
SLIDE 13

Probabilistic Circuits — Leaf Distributions

Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X. Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, …

6/21

x p(x) h(x) exp(θTT(x) − A(θ))

slide-14
SLIDE 14

Probabilistic Circuits — Leaf Distributions

Arbitrary probability function (pdf, pmf, mixed) over some set of random variables X. Should facilitate tractable inference routines, e.g. marginalization, conditioning, MAP, …

6/21

x p(x) h(x) exp(θTT(x) − A(θ))

Gaussian, Bernoulli, Dirichlet, Poisson, Gamma, …

slide-15
SLIDE 15

Probabilistic Circuits — Products

7/21

slide-16
SLIDE 16

Probabilistic Circuits — Products

Simply product units

8/21

slide-17
SLIDE 17

Probabilistic Circuits — Products

Simply product units

8/21

0.5 1.4

slide-18
SLIDE 18

Probabilistic Circuits — Products

Simply product units

8/21

0.5 1.4 0.7

slide-19
SLIDE 19

Probabilistic Circuits — Sums

9/21

slide-20
SLIDE 20

Probabilistic Circuits — Sums

Weighted sums

10/21

slide-21
SLIDE 21

Probabilistic Circuits — Sums

Weighted sums

10/21

w1 w2

slide-22
SLIDE 22

Probabilistic Circuits — Sums

Weighted sums

10/21

w1 w2 3.14 42.

slide-23
SLIDE 23

Probabilistic Circuits — Sums

Weighted sums

10/21

w1 w2 3.14 42. w1 3.14 + w2 42.

slide-24
SLIDE 24

Probabilistic Circuits — Sums

Weighted sums

10/21

w1 w2 3.14 42. w1 3.14 + w2 42. wk ≥ 0

slide-25
SLIDE 25

Probabilistic Circuits — Sums

Weighted sums

10/21

w1 w2 3.14 42. w1 3.14 + w2 42. wk ≥ 0 ∑

k wk = 1

slide-26
SLIDE 26

Probabilistic Circuits

Computational graph containing distributions, products, and weighted sums. Plus: Structural properties!

11/21

slide-27
SLIDE 27

Probabilistic Circuits

Computational graph containing distributions, products, and weighted sums. Plus: Structural properties!

11/21

slide-28
SLIDE 28

Probabilistic Circuits

Computational graph containing distributions, products, and weighted sums. Plus: Structural properties!

11/21

X1 X2 X3

slide-29
SLIDE 29

Probabilistic Circuits

Computational graph containing distributions, products, and weighted sums. Plus: Structural properties!

11/21

X1 X2 X3 =: p(X1, X2, X3)

slide-30
SLIDE 30

Probabilistic Circuits

Computational graph containing distributions, products, and weighted sums. Plus: Structural properties!

11/21

X1 X2 X3 =: p(X1, X2, X3)

Smoothness sum children have same scope

slide-31
SLIDE 31

Probabilistic Circuits

Computational graph containing distributions, products, and weighted sums. Plus: Structural properties!

11/21

X1 X2 X3 =: p(X1, X2, X3)

Smoothness sum children have same scope Decomposability product children have disjoint scope

slide-32
SLIDE 32

Probabilistic Circuits — Inference

Example: Marginalization and Conditioning

X = Xq ∪ Xm ∪ Xe p(Xq | xe) = ∫ p(Xq, x′

m, xe)dx′ m

∫ ∫ p(x′

q, x′ m, xe)dx′ qdx′ m

Smoothness and decomposability Single bottom up pass! Check out our AAAI tutorial on Probabilistic Circuits! Upcoming tutorials at ECAI, ECML/PKDD, IJCAI!

12/21

slide-33
SLIDE 33

Probabilistic Circuits — Inference

Example: Marginalization and Conditioning

X = Xq ∪ Xm ∪ Xe p(Xq | xe) = ∫ p(Xq, x′

m, xe)dx′ m

∫ ∫ p(x′

q, x′ m, xe)dx′ qdx′ m

Smoothness and decomposability ⇒ Single bottom up pass! Check out our AAAI tutorial on Probabilistic Circuits! Upcoming tutorials at ECAI, ECML/PKDD, IJCAI!

12/21

slide-34
SLIDE 34

Probabilistic Circuits — Inference

Example: Marginalization and Conditioning

X = Xq ∪ Xm ∪ Xe p(Xq | xe) = ∫ p(Xq, x′

m, xe)dx′ m

∫ ∫ p(x′

q, x′ m, xe)dx′ qdx′ m

Smoothness and decomposability ⇒ Single bottom up pass! Check out our AAAI tutorial on Probabilistic Circuits! Upcoming tutorials at ECAI, ECML/PKDD, IJCAI!

12/21

slide-35
SLIDE 35

The Problem

slide-36
SLIDE 36

Einsum Networks

slide-37
SLIDE 37

Step I – Vectorize Nodes

15/21

slide-38
SLIDE 38

Step II – The Basic Einsum Operation

single einsum-operation

16/21

slide-39
SLIDE 39

Step II – The Basic Einsum Operation

Sk = WkijNiN′

j

single einsum-operation

16/21

slide-40
SLIDE 40

Step III – Einsum Layers

single einsum-operation

17/21

slide-41
SLIDE 41

Step III – Einsum Layers

Slk = WlkijNliN′

lj

single einsum-operation

17/21

slide-42
SLIDE 42

Results

slide-43
SLIDE 43

Runtime and Memory Comparison

10 20 30 40 100 101

Training time (sec/epoch)

10−2 10−1 100 101

GPU memory (GB) K

EiNets (x) SPFlow (+) LibSPN (*) 10−1 100 101 102

Training time (sec/epoch)

10−1 100

D (depth)

10−1 100 101 102

Training time (sec/epoch)

10−2 10−1 100 101

R (# replicas)

19/21

slide-44
SLIDE 44

Generative Image Models

20/21

slide-45
SLIDE 45

Conclusion

PCs: intersection of classical graphical models and neural networks. Crucial advantage: many exact inference routines. But, they used to be painful to scale. In this paper, we made a big step to close the gap. More to come!

https://github.com/cambridge-mlg/EinsumNetworks https://github.com/SPFlow/SPFlow

21/21