Identifiability and Consistency of Bayesian Network Structure - PowerPoint PPT Presentation

Identifiability and Consistency of Bayesian Network Structure Learning from Incomplete Data tjebbe.bodewes@linacre.ox.ac.uk scutari@idsia.ch Statistics, University of Oxford Artificial Intelligence (IDSIA) September 24, 2020 Tjebbe Bodewes 1 Marco Scutari 2 1 Zivver & Department of 2 Dalle Molle Institute for

Introduction . ⏟ ⏟ ⏟ ⏟ ⏟ parameter learning Assuming complete data, we can decompose P (𝒣 ∣ 𝒠) into Learning a Bayesian network B = (𝒣, Θ) from a data set 𝒠 involves: P (𝒣 ∣ 𝒠) ∝ P (𝒣) P (𝒠 ∣ 𝒣) = P (𝒣) ∫ P (𝒠 ∣ 𝒣, Θ) P (Θ ∣ 𝒣)𝑒Θ where P (𝒣) is the prior over the space of the DAGs and P (𝒠 ∣ 𝒣) is the marginal likelihood (ML) of the data; and then P (𝒠 ∣ 𝒣) = 𝑂 ∏ 𝑗=1 P (Θ ∣ 𝒣, 𝒠) ⋅ structure learning ⏟ P ( B ∣ 𝒠) = P (𝒣, Θ ∣ 𝒠) ⏟⏟ ⏟ ⏟ ⏟⏟⏟ ⏟ ⏟ ⏟⏟ learning = P (𝒣 ∣ 𝒠) ⏟ ⏟ ⏟ ⏟ P (𝒠 ∣ 𝒣) . Denote them with 𝑇 ML (𝒣 ∣ 𝒠) and 𝑇 BIC (𝒣 ∣ 𝒠) respectively. [∫ P (𝑌 𝑗 ∣ Π 𝑌 𝑗 , Θ 𝑌 𝑗 ) P (Θ 𝑌 𝑗 ∣ Π 𝑌 𝑗 )𝑒Θ 𝑌 𝑗 ] . where Π 𝑌 𝑗 are the parents of 𝑌 𝑗 in 𝒣 . BIC [9] is ofuen used to approximate

Learning a Bayesian Network from Incomplete Data When the data are incomplete, 𝑇 ML (𝒣 ∣ 𝒠) and 𝑇 BIC (𝒣 ∣ 𝒠) are no longer decomposable because we must integrate out missing values. conditional on the observed data using belief propagation [7, 8, 10]; expected sufgicient statistics. There are two ways of applying EM to structure learning: expected sufgicient statistics using the current best DAG. This The latter is computationally feasible for medium and large problems, but still computationally demanding. We can use Expectation-Maximisation (EM) [4]: • in the E-step, we compute the expected sufgicient statistics • in the M-step, we use complete-data learning methods with the • We can apply EM separately to each candidate DAG to be scored, as in the variational-Bayes EM [2]. • We can embed structure learning in the M-step, estimating the approach is called Structural EM [5, 6].

The Node-Averaged Likelihood 𝒠 (𝑗) properties hold more generally, and in particular that they hold for Balov proved both identifiability and consistency of structure learning ℓ(𝒣, Θ ∣ 𝒠) − 𝜇 𝑜 ℎ(𝒣), which Balov used to define Balov [1] proposed a more scalable approach for discrete BNs called conditional Gaussian BNs (CGBNs). 1 Θ 𝑌 𝑗 ) = ̄ Node-Average Likelihood (NAL). NAL computes each term using the 𝒠 (𝑗) ⊆ 𝒠 locally-complete data for which 𝑌 𝑗 , Π 𝑌 𝑗 are observed: ℓ(𝑌 𝑗 ∣ Π 𝑌 𝑗 , ̂ log P (𝑌 𝑗 ∣ Π 𝑌 𝑗 , ̂ |𝒠 (𝑗) | ∑ Θ 𝑌 𝑗 ) → E [ℓ(𝑌 𝑗 ∣ Π 𝑌 𝑗 )] , 𝑇 PL (𝒣 ∣ 𝒠) = ̄ 𝜇 𝑜 ∈ ℝ + , ℎ ∶ 𝔿 → ℝ + and structure learning as ̂ 𝒣 = argmax 𝒣∈ 𝔿 𝑇 PL (𝒣 ∣ 𝒠) . when using 𝑇 PL (𝒣 ∣ 𝒠) for discrete BNs. We will now prove both

Identifiability (General) ℓ(𝒣 0 , Θ 0 ) . ̄ 𝒣∈ 𝔿 ℓ(𝒣 ∗ , Θ ∗ ) = max [𝒣 0 ] is identifiable under MCAR, that is Identifiability follows from the above. ℓ(𝒣, Θ)} . ℓ(𝒣 0 , Θ 0 ) , then P 𝒣 ( X ) = P 𝒣 0 ( X ) . ℓ(𝒣 0 , Θ 0 ) . Under MCAR, we have: Denote the true DAG as 𝒣 0 and the equivalence class it belongs to as [𝒣 0 ] . 1. max 𝒣∈ 𝔿 ̄ ℓ(𝒣, Θ) = ̄ 2. If ̄ ℓ(𝒣, Θ) = ̄ 3. If 𝒣 0 ⊆ 𝒣 , then ̄ ℓ(𝒣, Θ) = ̄ 𝒣 0 ≅ min {𝒣 ∗ ∈ 𝔿 ∶ ̄

Consistency (for CGBNs) From [1], the sufgicient conditions for consistency are: 𝒣 is 𝑜→∞ 3. Under the above and condition 3, if lim inf consistent. 𝒣 is 𝒣 is consistent. Hessian exist finite. Then as 𝑜 → ∞ : not consistent. 𝑌 𝑗 3. ∃ 𝒣 ∶ Π (𝒣 0 ) ⊂ Π (𝒣) 1. If 𝒣 0 ⊆ 𝒣 1 , 𝒣 0 ⊈ 𝒣 2 , lim 𝑜→∞ P (𝑇 PL (𝒣 1 ∣ 𝒠) > 𝑇 PL (𝒣 2 ∣ 𝒠)) = 1 . 2. If 𝒣 0 ⊆ 𝒣 1 , 𝒣 1 ⊂ 𝒣 2 , lim 𝑜→∞ P (𝑇 PL (𝒣 1 ∣ 𝒠) > 𝑇 PL (𝒣 2 ∣ 𝒠)) = 1 . 𝑌 𝑗 , Π (𝒣) 𝑌 𝑘 = Π (𝒣 0 ) 𝑌 𝑘 , Π (𝒣) 𝑌 𝑗 ∖ Π (𝒣 0 ) 𝑌 𝑗 are neither always observed nor never observed (thus 𝒣 0 must not be a maximal DAG). Under some regularity conditions, we show when they hold for CGBNs: Let 𝒣 0 be identifiable, 𝜇 𝑜 → 0 as 𝑜 → ∞ , and assume MLEs and NAL’s 1. If 𝑜𝜇 𝑜 → ∞ , ̂ 2. Under MCAR and VAR ( NAL ) < ∞ , if √𝑜𝜇 𝑜 → ∞ , ̂ √𝑜𝜇 𝑜 < ∞ , then ̂

Conclusions complete data but not for incomplete data. either complete or incomplete data. are not nested; does necessarily make them vanish as 𝑜 → ∞ . • In 𝑇 BIC (𝒣 ∣ 𝒠) , 𝑜𝜇 𝑜 = log (𝑜)/2 → ∞ and √𝑜𝜇 𝑜 = log (𝑜)/(2√𝑜) → 0 , so BIC satisfies the first condition but not the second in the main result. Hence BIC is consistent for • The equivalent 𝑇 AIC (𝒣 ∣ 𝒠) does not satisfy either condition which confirms and extends the results in [3]. Hence AIC is not consistent for • How to choose 𝜇 𝑜 is an open problem. • Proving results is complicated because • 𝑇 PL (𝒣 ∣ 𝒠) is fitted on difgerent subsets of 𝒠 for difgerent 𝒣 , so models • variables have heterogeneous distributions; • DAGs that may represent misspecified models [11] are not representable in terms of 𝒣 0 so minimising Kullback-Leibler distances to obtain MLEs

Thanks! Any questions?

References I Psychometrika , 52(3):345–370, 1987. The Bayesian Structural EM Algorithm. Tah N. Friedman. In ICML , pages 125–133, 1997. Learning Belief Networks in the Presence of Missing Values and Hidden Variables. Tah N. Friedman. Journal of the Royal Statistical Society, Series B , pages 1–38, 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Tah A. P. Dempster, N. M. Laird, and D. B. Rubin. Extensions. Tah N. Balov. Model Selection and Akaike’s Information Criterion (AIC): The General Theory and its Analytical Tah H. Bozdogan. Bayesian Statistics , 7:453–464, 2003. Graphical Model Structures. The Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Tah M. Beal and Z. Ghahramani. Electronic Journal of Statistics , 7:1047–1077, 2013. Consistent Model Selection of Discrete Bayesian Networks from Incomplete Data. In UAI , pages 129–138, 1998.

References II Tah S. L. Lauritzen. The EM algorithm for Graphical Association Models with Missing Data. Computational Statistics & Data Analysis , 19(2):191–201, 1995. Tah J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference . Morgan Kaufmann Publishers Inc., 1988. Tah G. Schwarz. Estimating the Dimension of a Model. The Annals of Statistics , 6(2):461–464, 1978. Tah G. Shafer and P. P. Shenoy. Probability propagation. Annals of Mathematics and Artificial Intelligence , 2(1-4):327–351, 1990. Tah H. White. Maximum Likelihood Estimation of Misspecified Models. Econometrica , 50(1):1–25, 1982.

Identifiability and Consistency of Bayesian Network Structure - PowerPoint PPT Presentation

Identifiability and Consistency of Bayesian Network Structure Learning from Incomplete Data tjebbe.bodewes@linacre.ox.ac.uk scutari@idsia.ch Statistics, University of Oxford Artificial Intelligence (IDSIA) September 24, 2020 Tjebbe Bodewes

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Set-membership identifiability and estimation of parameters for uncertain nonlinear systems

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

A Simple and Efficient Solution of the Identifiability Problem for Hidden Markov Models and

Applications of computer algebra in the identifiability and diagnosability studies Nathalie

Identifiability, Integro-Differential Equations and Neurobiology F. Boulier, F. Lemaire, A.

Structural identifiability: An Introduction Mike Chappell & Neil Evans

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

Advanced consistency methods Chapter 8 ICS-275 Winter 2016 Winter 2016 ICS 275 - Constraint

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Identifiability of Models from Parsimony-Informative Pattern Frequencies John A. Rhodes

The Support Splitting Algorithm and its Application to Code-based Cryptography Dimitris E. Simos

Saturated fusion systems as stable retracts of groups Sune Precht Reeh MIT Topology Seminar,

Logics Definition A logic is a consequence relation over the set of formulas Fm of The

Modeling, Identification, & Fault Diagnostics of Batteries Scott Moura Assistant Professor |

Twitter De-Identification Jonathon Storrick Jon.Storrick@gmail.com Center for Computational

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

Outline GP hyperparameter inference Priors on GP hyperparameters Benefits of

Sambuz

Useful Links

Newsletter

Mail Us

Identifiability and Consistency of Bayesian Network Structure - PowerPoint PPT Presentation

Identifiability and Consistency of Bayesian Network Structure Learning from Incomplete Data tjebbe.bodewes@linacre.ox.ac.uk scutari@idsia.ch Statistics, University of Oxford Artificial Intelligence (IDSIA) September 24, 2020 Tjebbe Bodewes

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Set-membership identifiability and estimation of parameters for uncertain nonlinear systems

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

A Simple and Efficient Solution of the Identifiability Problem for Hidden Markov Models and

Applications of computer algebra in the identifiability and diagnosability studies Nathalie

Identifiability, Integro-Differential Equations and Neurobiology F. Boulier, F. Lemaire, A.

Structural identifiability: An Introduction Mike Chappell &amp; Neil Evans

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

Advanced consistency methods Chapter 8 ICS-275 Winter 2016 Winter 2016 ICS 275 - Constraint

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Identifiability of Models from Parsimony-Informative Pattern Frequencies John A. Rhodes

The Support Splitting Algorithm and its Application to Code-based Cryptography Dimitris E. Simos

Saturated fusion systems as stable retracts of groups Sune Precht Reeh MIT Topology Seminar,

Logics Definition A logic is a consequence relation over the set of formulas Fm of The

Modeling, Identification, &amp; Fault Diagnostics of Batteries Scott Moura Assistant Professor |

Twitter De-Identification Jonathon Storrick Jon.Storrick@gmail.com Center for Computational

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

Outline GP hyperparameter inference Priors on GP hyperparameters Benefits of

Sambuz

Useful Links

Newsletter

Mail Us

Structural identifiability: An Introduction Mike Chappell & Neil Evans

Modeling, Identification, & Fault Diagnostics of Batteries Scott Moura Assistant Professor |