Structured and Unstructured Spaces Guy Van den Broeck DeLBP Aug - - PowerPoint PPT Presentation

structured and unstructured spaces
SMART_READER_LITE
LIVE PREVIEW

Structured and Unstructured Spaces Guy Van den Broeck DeLBP Aug - - PowerPoint PPT Presentation

PSDDs for Tractable Learning in Structured and Unstructured Spaces Guy Van den Broeck DeLBP Aug 18, 2017 References Probabilistic Sentential Decision Diagrams Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche KR, 2014 Learning


slide-1
SLIDE 1

PSDDs for Tractable Learning in Structured and Unstructured Spaces

Guy Van den Broeck

DeLBP Aug 18, 2017

slide-2
SLIDE 2

Probabilistic Sentential Decision Diagrams

Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche KR, 2014

Learning with Massive Logical Constraints

Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche ICML LTPM workshop, 2014

Tractable Learning for Structured Probability Spaces

Arthur Choi, Guy Van den Broeck and Adnan Darwiche IJCAI, 2015

Tractable Learning for Complex Probability Queries

Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck. NIPS, 2015

Learning the Structure of PSDDs

Yitao Liang, Jessa Bekker and Guy Van den Broeck UAI, 2017

Towards Compact Interpretable Models: Learning and Shrinking PSDDs

Yitao Liang and Guy Van den Broeck IJCAI XAI workshop, 2017

References

slide-3
SLIDE 3

(P)SDDs in Melbourne

  • Sunday: Logical Foundations for Uncertainty and

Machine Learning Workshop

– Adnan Darwiche: “On the Role of Logic in Probabilistic Inference and Machine Learning” – YooJung Choi: “Optimal Feature Selection for Decision Robustness in Bayesian Networks”

  • Sunday: Explainable AI Workshop

– Yitao Liang: “Towards Compact Interpretable Models: Learning and Shrinking PSDDs”

  • Tuesday: IJCAI

– YooJung Choi (again)

slide-4
SLIDE 4

Structured vs. unstructured probability spaces?

slide-5
SLIDE 5

Courses:

  • Logic (L)
  • Knowledge Representation (K)
  • Probability (P)
  • Artificial Intelligence (A)

Data

Running Example

slide-6
SLIDE 6

Courses:

  • Logic (L)
  • Knowledge Representation (K)
  • Probability (P)
  • Artificial Intelligence (A)

Data

  • Must take at least one of

Probability or Logic.

  • Probability is a prerequisite for AI.
  • The prerequisites for KR is

either AI or Logic.

Constraints

Running Example

slide-7
SLIDE 7

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

unstructured

Probability Space

slide-8
SLIDE 8

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

unstructured

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

structured

Structured Probability Space

7 out of 16 instantiations are impossible

  • Must take at least one of

Probability or Logic.

  • Probability is a prerequisite for AI.
  • The prerequisites for KR is

either AI or Logic.

slide-9
SLIDE 9

Learning with Constraints

Data Constraints

(Background Knowledge) (Physics)

Statistical Model

(Distribution)

Learn

slide-10
SLIDE 10

Learning with Constraints

Learn a statistical model that assigns zero probability to instantiations that violate the constraints.

Data Constraints

(Background Knowledge) (Physics)

Statistical Model

(Distribution)

Learn

slide-11
SLIDE 11

Example: Video

[Lu, W. L., Ting, J. A., Little, J. J., & Murphy, K. P. (2013). Learning to track and identify players from broadcast sports videos.]

slide-12
SLIDE 12

Example: Video

[Lu, W. L., Ting, J. A., Little, J. J., & Murphy, K. P. (2013). Learning to track and identify players from broadcast sports videos.]

slide-13
SLIDE 13

Example: Robotics

[Wong, L. L., Kaelbling, L. P., & Lozano-Perez, T., Collision-free state estimation. ICRA 2012]

slide-14
SLIDE 14

Example: Robotics

[Wong, L. L., Kaelbling, L. P., & Lozano-Perez, T., Collision-free state estimation. ICRA 2012]

slide-15
SLIDE 15

Example: Language

  • Non-local dependencies:

At least one verb in each sentence

[Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [Chang, M. W., Ratinov, L., & Roth, D. (2012). Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

slide-16
SLIDE 16

Example: Language

  • Non-local dependencies:

At least one verb in each sentence

  • Sentence compression

If a modifier is kept, its subject is also kept

[Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [Chang, M. W., Ratinov, L., & Roth, D. (2012). Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

slide-17
SLIDE 17

Example: Language

  • Non-local dependencies:

At least one verb in each sentence

  • Sentence compression

If a modifier is kept, its subject is also kept

  • Information extraction

[Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [Chang, M. W., Ratinov, L., & Roth, D. (2012). Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

slide-18
SLIDE 18

Example: Language

  • Non-local dependencies:

At least one verb in each sentence

  • Sentence compression

If a modifier is kept, its subject is also kept

  • Information extraction
  • Semantic role labeling
  • … and many more!

[Chang, M., Ratinov, L., & Roth, D. (2008). Constraints as prior knowledge],…, [Chang, M. W., Ratinov, L., & Roth, D. (2012). Structured learning with constrained conditional models.], [https://en.wikipedia.org/wiki/Constrained_conditional_model]

slide-19
SLIDE 19

Example: Deep Learning

[Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471-476.]

slide-20
SLIDE 20

Example: Deep Learning

[Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471-476.]

slide-21
SLIDE 21

Example: Deep Learning

[Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471-476.]

slide-22
SLIDE 22

Example: Deep Learning

[Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., et al.. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471-476.]

slide-23
SLIDE 23

What are people doing now?

  • Ignore constraints
  • Handcraft into models
  • Use specialized distributions
  • Find non-structured encoding
  • Try to learn constraints
  • Hack your way around
slide-24
SLIDE 24

What are people doing now?

  • Ignore constraints
  • Handcraft into models
  • Use specialized distributions
  • Find non-structured encoding
  • Try to learn constraints
  • Hack your way around

Accuracy ? Specialized skill ? Intractable inference ? Intractable learning ? Waste parameters ? Risk predicting out of space ? you are on your own 

+

slide-25
SLIDE 25

Structured Probability Spaces

  • Everywhere in ML!

– Configuration problems, inventory, video, text, deep learning – Planning and diagnosis (physics) – Causal models: cooking scenarios (interpreting videos) – Combinatorial objects: parse trees, rankings, directed acyclic graphs, trees, simple paths, game traces, etc.

slide-26
SLIDE 26

Structured Probability Spaces

  • Everywhere in ML!

– Configuration problems, inventory, video, text, deep learning – Planning and diagnosis (physics) – Causal models: cooking scenarios (interpreting videos) – Combinatorial objects: parse trees, rankings, directed acyclic graphs, trees, simple paths, game traces, etc.

  • Some representations: constrained conditional

models, mixed networks, probabilistic logics.

slide-27
SLIDE 27

Structured Probability Spaces

  • Everywhere in ML!

– Configuration problems, inventory, video, text, deep learning – Planning and diagnosis (physics) – Causal models: cooking scenarios (interpreting videos) – Combinatorial objects: parse trees, rankings, directed acyclic graphs, trees, simple paths, game traces, etc.

  • Some representations: constrained conditional

models, mixed networks, probabilistic logics.

No statistical ML boxes out there that take constraints as input! 

slide-28
SLIDE 28

Structured Probability Spaces

  • Everywhere in ML!

– Configuration problems, inventory, video, text, deep learning – Planning and diagnosis (physics) – Causal models: cooking scenarios (interpreting videos) – Combinatorial objects: parse trees, rankings, directed acyclic graphs, trees, simple paths, game traces, etc.

  • Some representations: constrained conditional

models, mixed networks, probabilistic logics.

No statistical ML boxes out there that take constraints as input! 

Goal: Constraints as important as data! General purpose!

slide-29
SLIDE 29

Specification Language: Logic

slide-30
SLIDE 30

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

unstructured

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

structured

Structured Probability Space

7 out of 16 instantiations are impossible

  • Must take at least one of

Probability or Logic.

  • Probability is a prerequisite for AI.
  • The prerequisites for KR is

either AI or Logic.

slide-31
SLIDE 31

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

unstructured

L K P A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

structured

Boolean Constraints

7 out of 16 instantiations are impossible

slide-32
SLIDE 32

Combinatorial Objects: Rankings

10 items: 3,628,800 rankings 20 items: 2,432,902,008,176,640,000 rankings

rank sushi 1 fatty tuna 2 sea urchin 3 salmon roe 4 shrimp 5 tuna 6 squid 7 tuna roll 8 see eel 9 egg 10 cucumber roll rank sushi 1 shrimp 2 sea urchin 3 salmon roe 4 fatty tuna 5 tuna 6 squid 7 tuna roll 8 see eel 9 egg 10 cucumber roll

slide-33
SLIDE 33

Combinatorial Objects: Rankings

rank sushi 1 fatty tuna 2 sea urchin 3 salmon roe 4 shrimp 5 tuna 6 squid 7 tuna roll 8 see eel 9 egg 10 cucumber roll rank sushi 1 shrimp 2 sea urchin 3 salmon roe 4 fatty tuna 5 tuna 6 squid 7 tuna roll 8 see eel 9 egg 10 cucumber roll

Aij item i at position j (n items require n2 Boolean variables)

slide-34
SLIDE 34

Combinatorial Objects: Rankings

rank sushi 1 fatty tuna 2 sea urchin 3 salmon roe 4 shrimp 5 tuna 6 squid 7 tuna roll 8 see eel 9 egg 10 cucumber roll rank sushi 1 shrimp 2 sea urchin 3 salmon roe 4 fatty tuna 5 tuna 6 squid 7 tuna roll 8 see eel 9 egg 10 cucumber roll

Aij item i at position j (n items require n2 Boolean variables)

An item may be assigned to more than one position A position may contain more than one item

slide-35
SLIDE 35

Encoding Rankings in Logic

Aij : item i at position j

pos 1 pos 2 pos 3 pos 4 item 1 A11 A12 A13 A14 item 2 A21 A22 A23 A24 item 3 A31 A32 A33 A34 item 4 A41 A42 A43 A44

slide-36
SLIDE 36

Encoding Rankings in Logic

Aij : item i at position j

pos 1 pos 2 pos 3 pos 4 item 1 A11 A12 A13 A14 item 2 A21 A22 A23 A24 item 3 A31 A32 A33 A34 item 4 A41 A42 A43 A44

constraint: each item i assigned to a unique position (n constraints)

slide-37
SLIDE 37

Encoding Rankings in Logic

Aij : item i at position j

pos 1 pos 2 pos 3 pos 4 item 1 A11 A12 A13 A14 item 2 A21 A22 A23 A24 item 3 A31 A32 A33 A34 item 4 A41 A42 A43 A44

constraint: each item i assigned to a unique position (n constraints) constraint: each position j assigned a unique item (n constraints)

slide-38
SLIDE 38

Encoding Rankings in Logic

Aij : item i at position j

pos 1 pos 2 pos 3 pos 4 item 1 A11 A12 A13 A14 item 2 A21 A22 A23 A24 item 3 A31 A32 A33 A34 item 4 A41 A42 A43 A44

constraint: each item i assigned to a unique position (n constraints) constraint: each position j assigned a unique item (n constraints)

slide-39
SLIDE 39

Structured Space for Paths

  • cf. Nature paper
slide-40
SLIDE 40

Structured Space for Paths

  • cf. Nature paper

Good variable assignment (represents route) 184

slide-41
SLIDE 41

Structured Space for Paths

  • cf. Nature paper

Good variable assignment (represents route) 184 Bad variable assignment (does not represent route) 16,777,032

slide-42
SLIDE 42

Structured Space for Paths

  • cf. Nature paper

Good variable assignment (represents route) 184 Bad variable assignment (does not represent route) 16,777,032

Space easily encoded in logical constraints  See [Choi, Tavabi, Darwiche, AAAI 2016]

slide-43
SLIDE 43

Unstructured probability space: 184+16,777,032 = 224

Structured Space for Paths

  • cf. Nature paper

Good variable assignment (represents route) 184 Bad variable assignment (does not represent route) 16,777,032

Space easily encoded in logical constraints  See [Choi, Tavabi, Darwiche, AAAI 2016]

slide-44
SLIDE 44

“Deep Architecture”

Logic + Probability

slide-45
SLIDE 45
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Logical Circuits

slide-46
SLIDE 46
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Property: Decomposability

slide-47
SLIDE 47
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Property: Decomposability

slide-48
SLIDE 48
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Input: L, K, P, A

Property: Determinism

slide-49
SLIDE 49
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Input: L, K, P, A

Sentential Decision Diagram (SDD)

slide-50
SLIDE 50
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Input: L, K, P, A

Sentential Decision Diagram (SDD)

slide-51
SLIDE 51
  • L K

L  P A P  L

  • L 
  • P A

P

  • L K

L  P

  • P 

K K A A A A

Input: L, K, P, A

Sentential Decision Diagram (SDD)

slide-52
SLIDE 52

Tractable for Logical Inference

  • Is structured space empty? (SAT)
  • Count size of structured space (#SAT)
  • Check equivalence of spaces
slide-53
SLIDE 53

Tractable for Logical Inference

  • Is structured space empty? (SAT)
  • Count size of structured space (#SAT)
  • Check equivalence of spaces
  • Algorithms linear in circuit size 

(pass up, pass down, similar to backprop)

slide-54
SLIDE 54

Tractable for Logical Inference

  • Is structured space empty? (SAT)
  • Count size of structured space (#SAT)
  • Check equivalence of spaces
  • Algorithms linear in circuit size 

(pass up, pass down, similar to backprop)

slide-55
SLIDE 55
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

PSDD: Probabilistic SDD

slide-56
SLIDE 56
  • L K

L 

1

P A P 

1

L

  • L 
  • P A

P

0.6

  • L K

L 

1

P

  • P 

1

K K

0.2

A A

0.75

A A

0.9 0.1 0.1 0.6

Input: L, K, P, A

PSDD: Probabilistic SDD

1 0.3 0.4 0.8 0.25

slide-57
SLIDE 57
  • L K

L 

1

P A P 

1

L

  • L 

1.0

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Input: L, K, P, A

PSDD: Probabilistic SDD

slide-58
SLIDE 58
  • L K

L 

1

P A P 

1

L

  • L 

1.0

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Input: L, K, P, A Pr(L,K,P,A) = 0.3 x 1.0 x 0.8 x 0.4 x 0.25 = 0.024

PSDD: Probabilistic SDD

slide-59
SLIDE 59
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

A A

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

PSDD nodes induce a normalized distribution!

slide-60
SLIDE 60
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

A A

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

PSDD nodes induce a normalized distribution!

slide-61
SLIDE 61
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

A A

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Can read probabilistic independences off the circuit structure

PSDD nodes induce a normalized distribution!

slide-62
SLIDE 62

Tractable for Probabilistic Inference

  • MAP inference: Find most-likely assignment

(otherwise NP-complete)

  • Computing conditional probabilities Pr(x|y)

(otherwise PP-complete)

  • Sample from Pr(x|y)
slide-63
SLIDE 63

Tractable for Probabilistic Inference

  • MAP inference: Find most-likely assignment

(otherwise NP-complete)

  • Computing conditional probabilities Pr(x|y)

(otherwise PP-complete)

  • Sample from Pr(x|y)
  • Algorithms linear in circuit size 

(pass up, pass down, similar to backprop)

slide-64
SLIDE 64

Learning PSDDs

Logic + Probability + ML

slide-65
SLIDE 65
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Parameters are Interpretable

Explainable AI DARPA Program

slide-66
SLIDE 66
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Student takes course L

Parameters are Interpretable

Explainable AI DARPA Program

slide-67
SLIDE 67
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Student takes course L Student takes course P

Parameters are Interpretable

Explainable AI DARPA Program

slide-68
SLIDE 68
  • L K

L 

1

P A P 

1

L

  • L 

1

  • P A

P

0.6 0.4

  • L K

L 

1

P

  • P 

1

K K

0.8 0.2

A A

0.25 0.75

A A

0.9 0.1 0.1 0.6 0.3

Student takes course L Student takes course P Probability of P given L

Parameters are Interpretable

Explainable AI DARPA Program

slide-69
SLIDE 69

Learning Algorithms

  • Parameter learning:

Closed form max likelihood from complete data One pass over data to estimate Pr(x|y)

slide-70
SLIDE 70

Learning Algorithms

  • Parameter learning:

Closed form max likelihood from complete data One pass over data to estimate Pr(x|y) Not a lot to say: very easy!

slide-71
SLIDE 71

Learning Algorithms

  • Parameter learning:

Closed form max likelihood from complete data One pass over data to estimate Pr(x|y)

  • Circuit learning (naïve):

Compile constraints to SDD circuit – Use SAT solver technology Circuit does not depend on data Not a lot to say: very easy!

slide-72
SLIDE 72

Learning Preference Distributions

Special-purpose distribution: Mixture-of-Mallows

– # of components from 1 to 20 – EM with 10 random seeds – implementation of Lu & Boutilier PSDD

slide-73
SLIDE 73

Learning Preference Distributions

Special-purpose distribution: Mixture-of-Mallows

– # of components from 1 to 20 – EM with 10 random seeds – implementation of Lu & Boutilier PSDD

This is the naive approach, circuit does not depend on data!

slide-74
SLIDE 74

Learn Circuit from Data

Even in unstructured spaces

slide-75
SLIDE 75

Tractable Learning

Bayesian networks Markov networks

slide-76
SLIDE 76

Tractable Learning

Bayesian networks Markov networks Do not support linear-time exact inference

slide-77
SLIDE 77

Tractable Learning

SPNs Cutset Networks Historically: Polytrees, Chow-Liu trees, etc. Both are Arithmetic Circuits (ACs)

[Darwiche, JACM 2003]

slide-78
SLIDE 78

PSDDs are Arithmetic Circuits

2 1 n p1 s1 p2 s2 pn sn PSDD AC +

* * * * * *

1 2 n p1 s1 p2 s2 pn sn

slide-79
SLIDE 79

Tractable Learning

Strong Properties Representational Freedom

slide-80
SLIDE 80

Tractable Learning

Strong Properties Representational Freedom

DNN

slide-81
SLIDE 81

Tractable Learning

Strong Properties Representational Freedom

DNN

slide-82
SLIDE 82

Tractable Learning

Strong Properties Representational Freedom

DNN

SPN Cutset

slide-83
SLIDE 83

Tractable Learning

Perhaps the most powerful circuit proposed to date

Strong Properties Representational Freedom

DNN

SPN Cutset

slide-84
SLIDE 84

PSDDs for the Logic-Phobic

slide-85
SLIDE 85

PSDDs for the Logic-Phobic

Bottom-up

each node is a distribution

slide-86
SLIDE 86

PSDDs for the Logic-Phobic

Bottom-up

each node is a distribution

slide-87
SLIDE 87

PSDDs for the Logic-Phobic

slide-88
SLIDE 88

PSDDs for the Logic-Phobic

Multiply independent distributions

slide-89
SLIDE 89

PSDDs for the Logic-Phobic

slide-90
SLIDE 90

PSDDs for the Logic-Phobic

Weighted mixture of lower level distributions

slide-91
SLIDE 91

PSDDs for the Logic-Phobic

slide-92
SLIDE 92

PSDDs for the Logic-Phobic

slide-93
SLIDE 93

Variable Trees (vtrees)

PSDD Vtree Correspondence

slide-94
SLIDE 94

Learning Variable Trees

  • How much do vars depend on each other?
  • Learn vtree by hierarchical clustering
slide-95
SLIDE 95

Learning Variable Trees

  • How much do vars depend on each other?
  • Learn vtree by hierarchical clustering
slide-96
SLIDE 96

Learning Primitives

slide-97
SLIDE 97

Learning Primitives

slide-98
SLIDE 98

Learning Primitives

Primitives maintain PSDD properties and structured space!

slide-99
SLIDE 99

LearnPSDD

Vtree learning Construct the most naïve PSDD LearnPSDD (search for better structure)

1 2 3

slide-100
SLIDE 100

LearnPSDD

Vtree learning Construct the most naïve PSDD LearnPSDD (search for better structure)

1 2 3

Simulate

  • perations

Execute the best Generate candidate

  • perations
slide-101
SLIDE 101

Experiments on 20 datasets

slide-102
SLIDE 102

Experiments on 20 datasets

Compare with O-SPN: smaller size in 14, better LL in 11, win on both in 6 Compare with L-SPN: smaller size in 14, better LL in 6, win on both in 2

slide-103
SLIDE 103

Experiments on 20 datasets

Compare with O-SPN: smaller size in 14, better LL in 11, win on both in 6 Compare with L-SPN: smaller size in 14, better LL in 6, win on both in 2 Comparable in performance & Smaller in size

slide-104
SLIDE 104

Ensembles of PSDDs

slide-105
SLIDE 105

Ensembles of PSDDs

EM/Bagging

slide-106
SLIDE 106

Ensembles of PSDDs

EM/Bagging

slide-107
SLIDE 107

State-of-the-Art Performance

slide-108
SLIDE 108

State-of-the-Art Performance

State of the art in 6 datasets

slide-109
SLIDE 109

What happens if you ignore constraints?

slide-110
SLIDE 110

What happens if you ignore constraints?

Roadmap

Compile logic into a SDD Convert to a PSDD: Parameter estimation LearnPSDD

1 2 3

slide-111
SLIDE 111

What happens if you ignore constraints?

Roadmap

Compile logic into a SDD Convert to a PSDD: Parameter estimation LearnPSDD

1 2 3

Simulate

  • perations

Execute the best Generate candidate

  • perations
slide-112
SLIDE 112

What happens if you ignore constraints?

Discrete multi-valued data

𝑩: 𝒃𝟐, 𝒃𝟑, 𝒃𝟒 𝒃𝟐 ∧ ¬𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ 𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ ¬𝒃𝟑∧ 𝒃𝟒

slide-113
SLIDE 113

What happens if you ignore constraints?

Discrete multi-valued data

𝑩: 𝒃𝟐, 𝒃𝟑, 𝒃𝟒 𝒃𝟐 ∧ ¬𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ 𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ ¬𝒃𝟑∧ 𝒃𝟒

slide-114
SLIDE 114

What happens if you ignore constraints?

Discrete multi-valued data

𝑩: 𝒃𝟐, 𝒃𝟑, 𝒃𝟒 𝒃𝟐 ∧ ¬𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ 𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ ¬𝒃𝟑∧ 𝒃𝟒

slide-115
SLIDE 115

What happens if you ignore constraints?

Discrete multi-valued data

𝑩: 𝒃𝟐, 𝒃𝟑, 𝒃𝟒 𝒃𝟐 ∧ ¬𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ 𝒃𝟑∧ ¬𝒃𝟒 ∨ ¬𝒃𝟐 ∧ ¬𝒃𝟑∧ 𝒃𝟒 Never omit domain constraints

slide-116
SLIDE 116

Complex queries

and

Learning from constraints

slide-117
SLIDE 117

Incomplete Data

id X Y Z 1 x1 y2 z1 2 x2 y1 z2 3 x2 y1 z2 4 x1 y1 z1 5 x1 y2 z2

a classical complete dataset closed-form (maximum-likelihood estimates are unique)

slide-118
SLIDE 118

Incomplete Data

id X Y Z 1 x1 y2 z1 2 x2 y1 z2 3 x2 y1 z2 4 x1 y1 z1 5 x1 y2 z2

a classical complete dataset

id X Y Z 1 x1 y2

?

2 x2 y1

?

3

? ?

z2 4

?

y1 z1 5 x1 y2 z2

a classical incomplete dataset closed-form (maximum-likelihood estimates are unique) EM algorithm (on PSDDs)

slide-119
SLIDE 119

Incomplete Data

id X Y Z 1 x1 y2 z1 2 x2 y1 z2 3 x2 y1 z2 4 x1 y1 z1 5 x1 y2 z2

a classical complete dataset

id X Y Z 1 x1 y2

?

2 x2 y1

?

3

? ?

z2 4

?

y1 z1 5 x1 y2 z2

a classical incomplete dataset a new type of incomplete dataset

id X Y Z 1 X  Z 2 x2 and (y2 or z2) 3 x2  y1 4 X  Y  Z  1 5 x1 and y2 and z2

closed-form (maximum-likelihood estimates are unique) EM algorithm (on PSDDs) Missed in the ML literature

slide-120
SLIDE 120

id 1st sushi 2nd sushi 3rd sushi  1 fatty tuna sea urchin salmon roe  2 fatty tuna tuna shrimp  3 tuna tuna roll sea eel  4 fatty tuna salmon roe tuna  5 egg squid shrimp 

a classical complete dataset (e.g., total rankings)

id 1st sushi 2nd sushi 3rd sushi  1 fatty tuna sea urchin

?

 2 fatty tuna

? ?

 3 tuna tuna roll

?

 4 fatty tuna salmon roe

?

 5 egg

? ?

a classical incomplete dataset (e.g., top-k rankings)

Structured Datasets

slide-121
SLIDE 121

id 1st sushi 2nd sushi 3rd sushi  1 fatty tuna sea urchin salmon roe  2 fatty tuna tuna shrimp  3 tuna tuna roll sea eel  4 fatty tuna salmon roe tuna  5 egg squid shrimp 

a classical complete dataset (e.g., total rankings)

id 1st sushi 2nd sushi 3rd sushi  1 (fatty tuna > sea urchin) and (tuna > sea eel)  2 (fatty tuna is 1st) and (salmon roe > egg)  3 tuna > squid  4 egg is last  5 egg > squid > shrimp 

a new type of incomplete dataset (e.g., partial rankings) (represents constraints on possible total rankings)

Structured Datasets

slide-122
SLIDE 122

Learning from Incomplete Data

  • Movielens Dataset:

– 3,900 movies, 6,040 users, 1m ratings – take ratings from 64 most rated movies – ratings 1-5 converted to pairwise prefs.

  • PSDD for partial rankings

– 4 tiers – 18,711 parameters

rank movie 1 The Godfather 2 The Usual Suspects 3 Casablanca 4 The Shawshank Redemption 5 Schindler’s List 6 One Flew Over the Cuckoo’s Nest 7 The Godfather: Part II 8 Monty Python and the Holy Grail 9 Raiders of the Lost Ark 10 Star Wars IV: A New Hope

movies by expected tier

slide-123
SLIDE 123

PSDD Sizes

slide-124
SLIDE 124

Structured Queries

rank movie 1 Star Wars V: The Empire Strikes Back 2 Star Wars IV: A New Hope 3 The Godfather 4 The Shawshank Redemption 5 The Usual Suspects

slide-125
SLIDE 125

Structured Queries

rank movie 1 Star Wars V: The Empire Strikes Back 2 Star Wars IV: A New Hope 3 The Godfather 4 The Shawshank Redemption 5 The Usual Suspects

  • no other Star Wars movie in top-5
  • at least one comedy in top-5
slide-126
SLIDE 126

Structured Queries

rank movie 1 Star Wars V: The Empire Strikes Back 2 Star Wars IV: A New Hope 3 The Godfather 4 The Shawshank Redemption 5 The Usual Suspects

  • no other Star Wars movie in top-5
  • at least one comedy in top-5

rank movie 1 Star Wars V: The Empire Strikes Back 2 American Beauty 3 The Godfather 4 The Usual Suspects 5 The Shawshank Redemption

slide-127
SLIDE 127

Structured Queries

rank movie 1 Star Wars V: The Empire Strikes Back 2 Star Wars IV: A New Hope 3 The Godfather 4 The Shawshank Redemption 5 The Usual Suspects

  • no other Star Wars movie in top-5
  • at least one comedy in top-5

rank movie 1 Star Wars V: The Empire Strikes Back 2 American Beauty 3 The Godfather 4 The Usual Suspects 5 The Shawshank Redemption

diversified recommendations via logical constraints

slide-128
SLIDE 128

Conclusions

  • Structured spaces are everywhere 
  • PSDDs build on logical circuits
  • 1. Tractability
  • 2. Semantics
  • 3. Natural encoding of structured spaces
  • Learning is effective

1. From constraints encoding structured space

State of the art learning preference distributions

2. From standard unstructured datasets using search

State of the art on standard tractable learning datasets

  • Novel settings for inference and learning

Structured spaces / learning from constraints / complex queries

slide-129
SLIDE 129

Probabilistic Sentential Decision Diagrams

Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche KR, 2014

Learning with Massive Logical Constraints

Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche ICML LTPM workshop, 2014

Tractable Learning for Structured Probability Spaces

Arthur Choi, Guy Van den Broeck and Adnan Darwiche IJCAI, 2015

Tractable Learning for Complex Probability Queries

Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck. NIPS, 2015

Learning the Structure of PSDDs

Yitao Liang, Jessa Bekker and Guy Van den Broeck UAI, 2017

Towards Compact Interpretable Models: Learning and Shrinking PSDDs

Yitao Liang and Guy Van den Broeck IJCAI XAI workshop, 2017

References

slide-130
SLIDE 130

(P)SDDs in Melbourne

  • Sunday: Logical Foundations for Uncertainty and

Machine Learning Workshop

– Adnan Darwiche: “On the Role of Logic in Probabilistic Inference and Machine Learning” – YooJung Choi: “Optimal Feature Selection for Decision Robustness in Bayesian Networks”

  • Sunday: Explainable AI Workshop

– Yitao Liang: “Towards Compact Interpretable Models: Learning and Shrinking PSDDs”

  • Tuesday: IJCAI

– YooJung Choi (again)

slide-131
SLIDE 131

Conclusions

Statistical ML “Probability” Symbolic AI “Logic” Connectionism “Deep”

PSDD

slide-132
SLIDE 132

Questions?

PSDD with 15,000 nodes LearnPSDD code: https://github.com/UCLA-StarAI/LearnPSDD Other PSDD code: http://reasoning.cs.ucla.edu/psdd/ SDD code: http://reasoning.cs.ucla.edu/sdd/