Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS - PowerPoint PPT Presentation
Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017 Learning Bayesian Networks (BNs) BNs constitute a widely used graphical framework for representing probabilistic
Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017
Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables 2
Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables Goal: Provide scalable solutions for learning BNs in the presence of both discrete and continuous variables 3
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 4
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 5
The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) 6
The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples 7
The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples Scores a BN as the sum over all BIC calculations for each node given its parents 8
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 9
The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression 10
The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression ● Calculate a log-likelihood and degrees of freedom for BIC 11
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets 12
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset 13
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 14
Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score continuous child using BIC 15
Modeling a Continuous Child ● Let X, Y be continuous Y A ● Let A be discrete (|A| = 3) ● Want: lik X | Y, A , dof X | Y, A X 16
17
18
lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 19
lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 20
lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 -2lik X | Y, A + dof X | Y, A log n dof 1 lik 2 lik 3 dof 2 dof 3 21
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| 22
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets 23
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A 24
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset 25
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 26
Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score discrete child using BIC 27
Modeling a Discrete Child ● Let X be continuous ● Let A be discrete (|A| = 3) X A ● Want: lik A | X , dof A | X 28
1 3 2 29
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 30
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 31
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 32
∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 Define a procedure to shrink illegal distributions back into the domain of probabilities 33
1 3 2 34
1 3 2 35
1 2 lik A | X 3 dof A | X 36
1 2 lik A | X 3 dof A | X -2lik A | X + dof A | X log n 37
Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 38
The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions 39
The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions ● Calculate a log-likelihood and degrees of freedom for BIC 40
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete Y A X 41
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) X 42
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) p ( Y ∣ A ) 43
Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) Partitioned p ( Y ∣ A ) Gaussians 44
Modeling a Continuous Child ● Want: lik X, Y | A , dof X, Y | A p ( X ,Y ∣ A ) lik Y | A , dof Y | A p ( Y ∣ A ) Y A X 45
lik X, Y | A , dof X, Y | A 46
lik X, Y | A , dof X, Y | A 47
lik X, Y | A , dof X, Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 48
lik X, Y | A , dof X, Y | A lik X, Y | A = lik 1 + lik 2 + lik 3 dof X, Y | A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 49
lik Y | A , dof Y | A 50
lik Y | A , dof Y | A 51
lik Y | A , dof Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 52
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.