Constraint-based Bayesian Network Learning with Permutation Tests - PowerPoint PPT Presentation

Constraint-based Bayesian Network Learning with Permutation Tests Marco Scutari marco.scutari@stat.unipd.it Adriana Brogini brogini@stat.unipd.it Department of Statistical Sciences University of Padova June 15, 2010 Marco Scutari and Adriana Brogini University of Padova

Bayesian networks: definitions A Bayesian network B = ( G , P) is a graphical model composed by: • a directed acyclic graph G = ( U , A ) . Each node represents a random variable X ∈ U and the arcs in A specify the conditional dependence structure of U . • a global probability distribution P ( U ) defined over the variable set U . It can be factorized into a set of local probability distributions of the form � P ( U ) = P ( X i | Π X i ) , X i ∈ U where Π X i is the set of the parents of the node X i . Marco Scutari and Adriana Brogini University of Padova

Learning Bayesian networks Model selection (usually called learning) of a Bayesian network is also performed in two steps: 1. structure learning: finding a graph structure that encodes the conditional independence (CI) relationships in the data. 2. parameter learning: fitting the parameters of each local distribution given the graph structure selected in the previous step. Most modern structure learning algorithms use conditional independence tests to find out CI constraints from data (constraint-based algorithms), sometimes together with goodness-of-fit scores (hybrid algorithms). Marco Scutari and Adriana Brogini University of Padova

Parametric vs Permutation tests for structure learning Proofs of correctness of structure learning algorithms assume that conditional independence tests do not incur in type I or type II errors [6, 8, 10]. This makes the use of parametric tests problematic because: • most of them are asymptotic or approximate; but they are often applied in situations where convergence is problematic (high-dimensional data, ”small n , large p ” settings). • they require distributional assumptions which are difficult to justify and rarely satisfied by real-world data. Permutation tests do not present any of these limitations [7], and therefore result in a more effective model selection. Marco Scutari and Adriana Brogini University of Padova

Model validation: experimental setting The impact of permutation tests on Bayesian network learning has been evaluated for the Max-Min Hill Climbing (MMHC) hybrid algorithm [9], which is one of the best performers up to date and has been extensively tested over a wide variety of data sets. In particular: • data sets have been generated from the ALARM network [2], which is often used a benchmark for testing structure learning algorithms. ALARM contains 37 dicrete nodes, for a total of 509 parameters. • the G 2 log-likelihood ratio test [1] have been used as a CI test, with an α = 0 . 05 threshold. G 2 is also equivalent to the mutual information CI test up to a constant [5]. Marco Scutari and Adriana Brogini University of Padova

Model validation: goodness of fit Goodness of fit has been measured with the following scores: • the Bayesian Information Criterion (BIC) [4], which is a penalized likelihood score. • the Bayesian Dirichlet equivalent (BDe) score [3], which is posterior Dirichlet distribution based on a uniform prior. • the Structural Hamming Distance (SHD) score [9], which is an extension of Hamming’s distance measure for undirected graphs. Each score has been computed on 4 sets of pairs of Bayesian networks learned from samples of different sizes ( 50 networks for each size) using parametric and permutation implementations of the G 2 CI test. Marco Scutari and Adriana Brogini University of Padova

Effect on the BIC score of fitted networks ● 0.10 ● Relative BIC improvement ● 0.05 ● ● ● ● ● ● ● ● 0.00 200 500 1000 5000 sample size Marco Scutari and Adriana Brogini University of Padova

Effect on the BDe score of fitted networks ● 0.15 Relative BDe improvement 0.10 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● 0.00 ● 200 500 1000 5000 sample size Marco Scutari and Adriana Brogini University of Padova

Effect on the BIC score, predictive goodness-of-fit 0.10 Relative BIC improvement ● ● 0.05 ● ● ● ● ● ● 0.00 ● 200 500 1000 5000 sample size Marco Scutari and Adriana Brogini University of Padova

Effect on the BDe score, predictive goodness-of-fit 0.10 Relative BDe improvement ● ● 0.05 ● ● ● ● ● ● ● 0.00 ● 200 500 1000 5000 sample size Marco Scutari and Adriana Brogini University of Padova

Effect on Structural Hamming Distance (SHD) 0.2 ● ● ● Relative SHD improvement 0.0 ● −0.2 −0.4 −0.6 ● 200 500 1000 5000 sample size Marco Scutari and Adriana Brogini University of Padova

Conclusions • The correctness of structure learning algorithms depends heavily on the performance of the underlying CI tests. • Parametric tests are problematic in many real-world settings in which Bayesian networks are used (”small n , large p ”). • Model selection based on permutation tests consistently produces networks with higher BIC and BDEu scores for both small and moderately large sample sizes. Marco Scutari and Adriana Brogini University of Padova

References Marco Scutari and Adriana Brogini University of Padova

References References I A. Agresti. Categorical Data Analysis . Wiley, 2002. I.A Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine , pages 247–256, 1989. D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning , 20(3):197–243, 1995. Available as Technical Report MSR-TR-94-09. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques . MIT Press, 2009. S. Kullback. Information Theory and Statistics . Wiley, 1959. D. Margaritis. Learning Bayesian Network Model Structure from Data . PhD thesis, School of Computer Science, Carnegie-Mellon University, May 2003. Available as Technical Report CMU-CS-03-153. Marco Scutari and Adriana Brogini University of Padova

References References II F. Pesarin. Multivariate Permutation Tests with Applications in Biostatistics . Wiley, 2001. I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Algorithms for Large Scale Markov Blanket Discovery. In Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference , pages 376–381, 2003. I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning , 65(1):31–78, 2006. T. S. Verma and J. Pearl. Equivalence and Synthesis of Causal Models. Uncertainty in Artificial Intelligence , 6:255–268, 1991. Marco Scutari and Adriana Brogini University of Padova

Constraint-based Bayesian Network Learning with Permutation Tests - PowerPoint PPT Presentation

Constraint-based Bayesian Network Learning with Permutation Tests Marco Scutari marco.scutari@stat.unipd.it Adriana Brogini brogini@stat.unipd.it Department of Statistical Sciences University of Padova June 15, 2010 Marco Scutari and

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Constraint-Based Refactoring Rename Field Problem Proven Correct Solution Constraint- Based

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

On Minimal Constraint Networks Georg Gottlob Minimal Constraint Networks Montanari 1974: To

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

WEM Reform: Constraint Development Responsibilities PSO-WG Meeting 3 February 2019 1

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Constraint Satisfaction Problems Chapter 6 Constraint Satisfaction Problems A constraint

Building a Bayesian Network 223 / 385 The construction of a Bayesian network Construction of a

Using the Global Constraint Seeker for Learning Structured Constraint Models: A First Attempt N.

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Maximizing the Spread of Maximizing the Spread of I nfluence through a Social I nfluence through

Iterative improvement algorithms Prof. Tuomas Sandholm Carnegie Mellon University Computer

Lecture 4: Problem solving as search: Solution = a finite sequence of actions

Phylogenetic trees Branch confidence Genome 559: Introduction to Statistical and Computational

Hill-Climbing Algorithm: lets go for a walk before finding the optimum Leticia Hernando,

20.1 Combinatorial Optimization next chapters: combinatorial optimization similar scenario,

OPENING QUESTIONS 1 Discuss in pairs What does this paper accomplish? What is the

Advanced Search Simulated annealing Yingyu Liang yliang@cs.wisc.edu Computer Sciences