[PPT] - Discriminative Bias for Learning Probabilistic Sentential Decision PowerPoint Presentation

SLIDE 1

Discriminative Bias for Learning Probabilistic Sentential Decision Diagrams

Laura I. Galindez Olascoaga✻, Wannes Meert ✻, Nimish Shah ✻, Guy Van den Broeck✣, Marian Verhelst ✻

✣ ✻

SLIDE 2

Outline

Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions

IDA2020 2

SLIDE 3

Motivation

IDA2020 3

Probabilistic inference has proven to be well suited for resource-constrained embedded applications. (Galindez et al. 2019) Probabilistic circuits successfully balance efficiency vs. expressiveness trade-offs while remaining robust. Some of these models’ robustness (from generative learning) is at

dds with discriminative

performance.

SLIDE 4

Objective

IDA2020 4

Keep robustness provided by generative learning strategies. Improve discriminative performance by exploiting knowledge encoding capabilities.

SLIDE 5

Outline

Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions

IDA2020 5

SLIDE 6

Background: probabilistic inference

IDA2020

Given a probabilistic model m of the world Answer probabilistic queries q3(m)=ArgmaxtimePrm ( ) q2(m)=Prm ( | ) q1(m)=Prm ( ) Evidence Conditional MAP

SLIDE 7

Background: tractable probabilistic inference

IDA2020 7

A query q(m) is tractable iff exactly computing it runs in time O(poly(|m|). There is an inherent trade-off between tractability and expressiveness

(From UAI 2019 tutorial on Tractable Probabilistic Models by Vergari, Di Mauro and Van den Broeck and AAAI 2020 tutorial on Probabilistic Circuits by Vergari, Choi, Peharz and Van den Broeck)

SLIDE 8

Background: probabilistic circuits

IDA2020 8

A probabilistic circuit is a computational graph that encodes a probability distribution p(X).

(From UAI 2019 tutorial on Tractable Probabilistic Models by Vergari, Di Mauro and Van den Broeck and AAAI 2020 tutorial on Probabilistic Circuits by Vergari, Choi, Peharz and Van den Broeck)

SLIDE 9

PSDDs are probabilistic extensions to SDDs, which represent Boolean functions as logical circuits (Kisa et al., 2014).

Background: what is a PSDD?

IDA2020 9

therwise

(Example from Liang et al., 2017)

Pr = 0.2 Pr ) = (0.1 𝑗𝑔 0.7 𝑗𝑔 Pr ) = (1 𝑗𝑔 ∧ 0 𝑗𝑔

Bayesian Network

0.2 0.8 0.1 0.9 1.0

1 2 2 PSDD

0.7:

SLIDE 10

Background: PSDDs’ properties

IDA2020 11

𝜄! = 0.2 𝜄" = 0.8

Decision node … … The left variable of the AND gate is the prime (p) and the right is the sub (s). Edges of decision nodes are annotated with a normalized probability distribution. 𝑞! 𝑡! 𝑞" 𝑡" 1 2 Vtree 1

SLIDE 11

Background: PSDDs’ properties

IDA2020 12

0.2 0.8 0.1 0.9 1.0

1 2 1 2 2 PSDD Vtree

0.7:

1) Decomposability: inputs of AND node must be disjoint. Syntactic restrictions: 2) Determinism: only one of the decision node’s inputs can be true. Prime variables 𝒀 = {𝑆𝑏𝑗𝑜} Sub variables For example at 1: 𝒁 = {𝑇𝑣𝑜, 𝑆𝑐𝑝𝑥} See (Kisa et al., 2014).

SLIDE 12

Background: PSDDs’ properties

IDA2020 13

𝑄𝑠

# 𝒀𝒁|[𝑞$] = 𝑄𝑠 %!(𝒀|[𝑞$])𝑄𝑠 &!(𝒁|[𝑞$])

= 𝑄𝑠

%! (𝒀)𝑄𝑠 &!(𝒁)

𝑄𝑠

' 𝒀𝒁 = ; $

𝜄$𝑄𝑠%$(𝒀)𝑄𝑠&$(𝒁)

Decision nodes q encode the distribution:

𝜄! = 0.2 𝜄" = 0.8

Decision node … … 𝑞! 𝑡! 𝑞" 𝑡" 1 A logical sentence that defines the support of node distribution

SLIDE 13

Background: PSDDs’ properties

IDA2020 14

Decision nodes q encode the distribution:

Prime variables ! = {$%&'} Sub variables For example at 1: ) = {*+', $-./}

𝑄𝑠

! 𝒀𝒁 = 0.2 ⋅ 𝑄𝑠 #! 𝒀 𝑄𝑠 $! 𝒁 +

0.8 ⋅ 𝑄𝑠

#" 𝒀 𝑄𝑠 $" 𝒁

𝜄! = 0.2 𝜄" = 0.8

Decision node … … 𝑞! 𝑡! 𝑞" 𝑡" 1 0.2 ⋅ 𝑄𝑠

#! 𝒀|[ ] 𝑄𝑠 $! 𝒁|[ ]

0.8 ⋅ 𝑄𝑠

#! 𝒀|[ ] 𝑄𝑠 $! 𝒁|[ ]

𝑄𝑠

' 𝒀𝒁 = ; $

𝜄$𝑄𝑠%$(𝒀)𝑄𝑠&$(𝒁)

SLIDE 14

Background: learning PSDDs

IDA2020 15

The LearnPSDD algorithm (Liang et al., 2017) learns the PSDD structure incrementally from data.

1 2

Learn vtree from data (Minimize mutual information)

Generate candidate

perations

Calculate log-llk improvement Execute best

peration

Iteratively apply split and clone operations

0.2 0.8 0.1 0.9 1.0

1 2 2

0.7:

1 2

1.0 1.0

… … …

SLIDE 15

Outline

Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions

IDA2020 16

SLIDE 16

Classification with PSDDs

Given a feature variable set 𝑮 and a class variable 𝐷. The classification task can be stated as a probabilistic query:

IDA2020 17

Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷)

LearnPSDD remains agnostic to the classification task With LearnPSDD features might never be conditioned on the class

Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷)

SLIDE 17

Bayesian Network classifiers

IDA2020 18

Pr 𝐷 𝑮 ~ Pr 𝑮 𝐷 ⋅ Pr(𝐷)

! "

!

"

#

"

$

! "

!

"

#

"

$

With Bayesian Network classifiers features are always conditioned on the class. With LearnPSDD features might never be conditioned on the class.

Effects of explicitly conditioning 𝑮 on 𝐷.

SLIDE 18

Enforcing the discriminative bias: D-LearnPSDD

IDA2020 20

Make sure that feature variables 𝑮 can be conditioned

n the class variable 𝐷.

Minimize conditional mutual information

SLIDE 19

Enforcing the discriminative bias: D-LearnPSDD

IDA2020 21

Make sure that feature variables 𝑮 can be conditioned

n the class variable 𝐷.

Minimize conditional mutual information Initializing on a fully factorized distribution

SLIDE 20

IDA2020 22

Make sure that feature variables 𝑮 can be conditioned

n the class variable 𝐷.
However, only setting the vtree is not enough.

𝑮 still independent from 𝐷

Enforcing the discriminative bias: D-LearnPSDD

SLIDE 21

IDA2020 23

Make sure that feature variables 𝑮 can be conditioned

n the class variable 𝐷.
Set

Enforcing the discriminative bias: D-LearnPSDD

SLIDE 22

IDA2020 24

Encodes a naive Bayes structure

Make sure that feature variables 𝑮 can be conditioned

n the class variable 𝐷.
Set
LearnPSDD ensures that the base of the root node remains

unchanged.

Enforcing the discriminative bias: D-LearnPSDD

SLIDE 23

Outline

Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions

IDA2020 25

SLIDE 24

Experimental results

IDA2020 26

15 UCI datasets
5-fold cross validation
Average accuracy over a range of model size
Model size is number of parameters

SLIDE 25

Experimental results

IDA2020 27

SLIDE 26

Experimental results

IDA2020 28

D-LearnPSDD remains robust against missing features.

SLIDE 27

Outline

Motivation and objective Background Discriminative bias for learning PSDDs Experimental results Conclusions

IDA2020 29

SLIDE 28

Conclusions

We introduced a PSDD learning technique that improves classification performance by introducing a discriminative bias. Robustness is ensured by exploiting the generative learning strategy. The proposed technique outperforms purely generative PSDDs in terms of classification accuracy and the other baseline classifiers in terms of robustness.

IDA2020 30

SLIDE 29

References

Laura I. Galindez Olascoaga, Wannes Meert, Nimish Shah, Marian Verhelst and Guy Van den Broeck. Towards Hardware-Aware Tractable Learning of Probabilistic Models, In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019. YooJung Choi, Antonio Vergari, Robert Peharz and Guy Van den Broeck. Probabilistic Circuits: Representation and Inference, AAAI tutorial, 2020. Yitao Liang, Jessa Bekker and Guy Van den Broeck. Learning the Structure of Probabilistic Sentential Decision Diagrams, In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), 2017. Doga Kisa, Guy Van den Broeck, Arthur Choi and Adnan Darwiche. Probabilistic sentential decision diagrams, In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (KR), 2014.

IDA2020 31