Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS - PowerPoint PPT Presentation

Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017

Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables 2

Learning Bayesian Networks (BNs) ● BNs constitute a widely used graphical framework for representing probabilistic relationships ● Many application in Bayesian Inference and Causal Discovery ● Learning structure is crucial – Limited work has been done in the presence of both discrete and continuous variables Goal: Provide scalable solutions for learning BNs in the presence of both discrete and continuous variables 3

Outline ● Bayesian Information Criterion (BIC) ● Mixed Variable Polynomial (MVP) score ● Conditional Gaussian (CG) score ● Adaptations ● Simulations and empirical results 4

The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) 6

The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples 7

The Bayesian Information Criterion Let M be a model and D be a dataset BIC is an approximation for log p ( M|D ) log p ( M ∣ D )≈− 2 lik + dof log n Where lik is the log likelihood, dof are the degrees of freedom, and n is the number of samples Scores a BN as the sum over all BIC calculations for each node given its parents 8

The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression 10

The Mixed Variable Polynomial (MVP) score ● Use higher order polynomials to estimate relationships between variables – Allows for nonlinear relationships between continuous variables – Allows for complicated PMFs for discrete variables Approximates Logistic Regression ● Calculate a log-likelihood and degrees of freedom for BIC 11

Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets 12

Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset 13

Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 14

Modeling a Continuous Child ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score continuous child using BIC 15

Modeling a Continuous Child ● Let X, Y be continuous Y A ● Let A be discrete (|A| = 3) ● Want: lik X | Y, A , dof X | Y, A X 16

lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 19

lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 20

lik X | Y, A = lik 1 + lik 2 + lik 3 dof X | Y, A = dof 1 + dof 2 + dof 3 lik 1 -2lik X | Y, A + dof X | Y, A log n dof 1 lik 2 lik 3 dof 2 dof 3 21

Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| 22

Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets 23

Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A 24

Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset 25

Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together 26

Modeling a Discrete Child ● Binarize the child A into d (0, 1) variables where d = |A| ● Partition according to the discrete parents – Splits the data into subsets ● Perform regression with the continuous parents for each partition – Treat the regression lines a components to PMFs for A ● Calculate a log likelihood and degrees of freedom for each subset ● Aggregate the log likelihood and degrees of freedom terms from each subset together ● Score discrete child using BIC 27

Modeling a Discrete Child ● Let X be continuous ● Let A be discrete (|A| = 3) X A ● Want: lik A | X , dof A | X 28

1 3 2 29

∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 30

∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x 3 31

∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 32

∑ p ( A = a ∣ X = x )= 1 ∀ x a ∈{ 0,1,2 } 1 2 – True for the proposed method p ( A = a ∣ X = x )≥ 0 ∀ a, x – True in the sample limit given some assumptions 3 Define a procedure to shrink illegal distributions back into the domain of probabilities 33

1 3 2 34

1 3 2 35

1 2 lik A | X 3 dof A | X 36

1 2 lik A | X 3 dof A | X -2lik A | X + dof A | X log n 37

The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions 39

The Conditional Gaussian (CG) score ● Move all the continuous variables to the left and all the discrete variables to the right of the conditioning bar – Calculate the desired probability using partitioned Gaussian and Multinomial distributions ● Calculate a log-likelihood and degrees of freedom for BIC 40

Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete Y A X 41

Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) X 42

Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) p ( Y ∣ A ) 43

Modeling a Continuous Child Assume Y, A are Let X, Y be continuous parents of X Let A be discrete p ( X ∣ Y , A )= p ( X ,Y , A ) Y A p ( Y , A ) 1 = p ( X ,Y ∣ A ) p ( A ) p ( Y ∣ A ) p ( A ) X 1 = p ( X ,Y ∣ A ) Partitioned p ( Y ∣ A ) Gaussians 44

Modeling a Continuous Child ● Want: lik X, Y | A , dof X, Y | A p ( X ,Y ∣ A ) lik Y | A , dof Y | A p ( Y ∣ A ) Y A X 45

lik X, Y | A , dof X, Y | A 46

lik X, Y | A , dof X, Y | A 47

lik X, Y | A , dof X, Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 48

lik X, Y | A , dof X, Y | A lik X, Y | A = lik 1 + lik 2 + lik 3 dof X, Y | A = dof 1 + dof 2 + dof 3 lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 49

lik Y | A , dof Y | A 50

lik Y | A , dof Y | A 51

lik Y | A , dof Y | A lik 1 dof 1 lik 2 lik 3 dof 2 dof 3 52

Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS - PowerPoint PPT Presentation

Scoring Bayesian Networks of Mixed Variables Bryan Andrews, MS Joesph Ramsey, PhD and Greg Cooper, MD, PhD August 14, 2017 Learning Bayesian Networks (BNs) BNs constitute a widely used graphical framework for representing probabilistic

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Mixed Precision Training PAI Overview What is mixed-precision

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Welcome to Scoring the ACIRI a Job Aid. 1 This job aid provides a brief review of the scoring

Investment Board April 21, 2014 Agenda UW-IT Portfolio Scoring Process Scoring Results

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

SI Scoring Guide SUBORDINATION INDEX USING SALT Discuss the scoring rules SALT SOFTWARE, LLC

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Mechanical Engineering Drawing MECH 211/M / Lecture #9 Chapters 11 and 12 p Dr. John Cheung

ECE321 Electronics I Fall 2006 Professor James E. Morris Lecture 8 19 th October, 2006 MOS

Data Structures and Object-Oriented Design VI Spring 2014 Carola Wenk UML UML = Unified

Head to tail compositions Oleg Viro November 27, 2014 1 / 23 Plane Isometries Theorem. Any

8 .1 Introduction Inequalities are mathematical statements involving the symbols

Ozawas class S for locally compact groups and unique prime factorization of group von Neumann

Python Programing: An Introduction to Computer Science Chapter 11 Data Collections Python

UML Class Diagrams Steven Zeil February 25, 2013 UML Class Diagrams Outline Class