Who Learns Better Bayesian Network Structures Constraint-Based, - PowerPoint PPT Presentation

Who Learns Better Bayesian Network Structures Constraint-Based, Score-based or Hybrid Algorithms? Marco Scutari 1 Catharina Elisabeth Graafland 2 errez 2 Jos´ e Manuel Guti´ 1 Department of Statistics University of Oxford, UK scutari@stats.ox.ac.uk 2 Institute of Physics of Cantabria (CSIC-UC) Santander, Spain September 11, 2018

Outline Bayesian network Structure learning is defined by the combination of a statistical criterion and an algorithm that determines how the criterion is applied to the data. After removing the confounding effect of different choices for the statistical criterion, we ask the following questions: Q1 Which of constraint-based and score-based algorithms provide the most accurate structural reconstruction? Q2 Are hybrid algorithms more accurate than constraint-based or score-based algorithms? Q3 Are score-based algorithms slower than constraint-based and hybrid algorithms? Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Classes of Structure Learning Algorithms Structure learning consists in finding the DAG G that encodes the dependence structure of a data set D with n observations. Algorithms for this task fall into one three classes: • Constraint-based algorithms identify conditional independence constraints with statistical tests, and link nodes that are not found to be independent. • Score-based algorithms are applications of general optimisation techniques; each candidate DAG is assigned a network score maximise as the objective function. • Hybrid algorithms have a restrict phase implementing a constraint-based strategy to reduce the space of candidate DAGs; and a maximise phase implementing a score-based strategy to find the optimal DAG in the restricted space. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Conditional Independence Tests and Network Scores For discrete BNs, the most common test is the log-likelihood ratio test R C L G 2 ( X, Y | Z ) = 2 log P( X | Y, Z ) n ijk log n ijk n ++ k � � � = 2 , P( X | Z ) n i + k n + jk i =1 j =1 k =1 has an asymptotic χ 2 ( R − 1)( C − 1) L distribution. For GBNs, XY | Z ) . G 2 ( X, Y | Z ) = n log(1 − ρ 2 ∼ χ 2 1 . As for network scores, the Bayesian Information criterion N � log P( X i | Π X i ) − | Θ X i | � � BIC( G ; D ) = log n , 2 i =1 is a common choice for both discrete BNs and GBNs, as it provides a simple approximation to log P( G | D ) . log P( G | D ) itself is available in closed form as BDeu and BGeu [5, 4]. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Score- and Constraint-Based Algorithms Can Be Equivalent Cowell [3] famously showed that constraint-based and score-based algorithms can select identical discrete BNs. 1. He noticed that the G 2 test in has the same expression as a score-based network comparison based on the log-likelihoods log P( X | Y, Z ) − log P( X | Z ) if we take Z = Π X . 2. He then showed that these two classes of algorithms are equivalent if we assume a fixed, known topological ordering and we use log-likelihood and G 2 as matching statistical criteria. We take the same view that the algorithms and the statistical criteria they use are separate and complementary in determining the overall behaviour of structure learning. We then want to remove the confounding effect of choices for the statistical criterion from our evaluation of the algorithms. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Constructing Matching Tests and Scores Consider two DAGs G + and G − that differ by a single arc X j → X i . In a score-based approach, we can compare them using BIC: BIC( G + ; D ) > BIC( G − ; D ) ⇒ 2 log P( X i | Π X i ∪ { X j } ) � � | Θ G + X i | − | Θ G − > X i | log n P( X i | Π X i ) which is equivalent to testing the conditional independence of X i and X j given Π X i using the G 2 test, just with a different significance thresh- old. We will call this test G 2 BIC and use it as the matching statistical criterion for BIC to compare different learning algorithms. For discrete BNs, starting from log P( G | D ) we get log P( G + | D ) > log P( G − | D ) ⇒ log BF = log P( G + | D ) P( G − | D ) > 0 , which uses Bayes factors as matching tests for BDeu. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

A Simulation Study We assess three constraint-based algorithms (PC [2], GS [6], Inter-IAMB [13]), two score-based algorithms (tabu search, simulated annealing [7] for BIC, GES [1] for log BDeu ) and two hybrid algorithms (MMHC [10], RSMAX2 [9]) on 14 reference networks [8]. For each BN: 1. We generate 20 samples of size n/ | Θ | = 0 . 1 , 0 . 2 , 0 . 5 (small samples), 1 . 0 , 2 . 0 , 5 . 0 (large samples). 2. We learn G using (BIC, G 2 BIC ), and ( log BDeu , log BF ) as well for discrete BNs. 3. We measure the accuracy of the learned DAGs using SHD/ | A | [10] from the reference BN; and we measure the speed of the learning algorithms with the number of calls to the statistical criterion. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Discrete Bayesian Networks (Large Samples) 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.4 ALARM 0.2 ANDES CHILD HAILFINDER 0.2 3.2 3.4 3.6 3.8 4.0 4.2 4.8 5.0 5.2 5.4 5.6 2.6 2.8 3.0 3.2 3.4 3.6 3.8 3.6 3.8 4.0 4.2 4.4 1.2 1.0 0.85 0.90 0.95 1.00 1.05 0.8 1.1 0.6 0.8 1.0 Scaled SHD 0.4 0.9 0.6 0.2 0.8 0.4 HEPAR2 MUNIN1 PATHFINDER PIGS 0.0 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 4.2 4.4 4.6 4.8 5.0 5.5 6.0 6.5 0.700.750.800.850.900.95 1.0 0.8 0.6 0.4 WATER WIN95PTS log 10 ( calls to the statistical criterion ) 3.2 3.4 3.6 3.8 4.0 4.2 4.0 4.2 4.4 4.6 4.8 Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Discrete Bayesian Networks (Small Samples) 1.4 1.2 1.2 1.0 1.2 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 ALARM ANDES CHILD HAILFINDER 0.6 3.2 3.4 3.6 3.8 4.0 4.2 4.8 5.0 5.2 5.4 5.6 2.6 2.8 3.0 3.2 3.4 3.6 3.8 3.6 3.8 4.0 4.2 4.4 4.6 1.0 1.5 1.2 1.1 0.8 1.4 1.1 Scaled SHD 1.3 0.6 1.0 1.0 1.2 0.9 0.4 1.1 0.9 0.8 PATHFINDER 0.2 1.0 0.7 HEPAR2 MUNIN1 PIGS 0.0 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 1.00 0.7 0.8 0.9 1.0 1.1 1.2 0.95 0.90 0.85 WATER WIN95PTS log 10 ( calls to the statistical criterion ) 3.0 3.2 3.4 3.6 3.8 4.0 3.8 4.0 4.2 4.4 4.6 4.8 Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Gaussian Bayesian Networks 5 10 7 ARTH150 ECOLI70 MAGIC−IRRI MAGIC−NIAB 6 (small samples) (small samples) (small samples) (small samples) 6 4 8 5 5 4 6 3 4 3 4 3 2 2 2 2 1 Scaled SHD 1 1 4.5 5.0 5.5 6.0 3.6 3.8 4.0 4.2 4.4 4.6 4 5 6 7 3.5 4.0 4.5 5.0 5.5 6.0 1.2 ARTH150 ECOLI70 MAGIC−IRRI MAGIC−NIAB 2.0 1.4 1.2 (large samples) (large samples) (large samples) (large samples) 1.0 1.2 1.5 1.0 0.8 1.0 0.6 1.0 0.8 0.8 0.6 0.4 0.6 0.5 0.4 0.2 4.5 5.0 5.5 6.0 6.5 7.0 4.0 4.5 5.0 3.8 4.0 4.2 4.4 3.4 3.6 3.8 4.0 log 10 ( calls to the statistical criterion ) Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Overall Conclusions Discrete networks: • score-based algorithms often have higher SHDs for small samples; • hybrid and constraint-based algorithms have comparable SHDs; • constraint-based algorithms have better SHD than score-based algorithms for small sample sizes in 7/10 BNs, but it decreases more slowly as n increases for all BNs; • simulated annealing is consistently slower; tabu search is always fast and accurate in large samples, for 6/10 BNs in small samples. Gaussian networks: • tabu search and simulated annealing have larger SHDs than constraint-based or hybrid algorithms for most samples; • hybrid and constraint-based algorithms have roughly the same SHD for all sample sizes. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Real-World Climate Data... Climate networks aim to analyse the complex spatial structure of climate data: spatial dependence among nearby locations, but also long-range large-scale oscillation patterns over distant regions in the world, known as teleconnections [11], such as the El Ni˜ no Southern Oscillation (ENSO) [12]. We confirm the results above using NCEP/NCAR monthly surface tem- perature data on a global 10 ◦ -resolution grid between 1981 and 2010. This gives sample size n = 30 × 12 = 360 and variables N = 18 × 36 = 648 , which we model with a Gaussian Bayesian network. The sample would count as a “small sample” in the simulation study. Marco Scutari, Catharina Elisabeth Graafland, Jos´ e Manuel Guti´ errez University of Oxford

Who Learns Better Bayesian Network Structures Constraint-Based, - PowerPoint PPT Presentation

Who Learns Better Bayesian Network Structures Constraint-Based, Score-based or Hybrid Algorithms? Marco Scutari 1 Catharina Elisabeth Graafland 2 errez 2 Jos e Manuel Guti 1 Department of Statistics University of Oxford, UK

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Building a Bayesian Network 223 / 385 The construction of a Bayesian network Construction of a

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

Bayes Nets (Ch. 14) Announcements Homework 1 posted Bayesian Network A Bayesian network (Bayes

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

Developmental Stage Explores the relationship of feelings, goals, and behaviour. Learns

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

The Bayesian Network Framework 89 / 384 The network formalism, informal A Bayesian network

Justin Derner, Agricultural Research 11/28/2017 Service Drought Management: Road Map Similar

Janu nuary 1 y 16, 2019 Boar ard M d Meeti ting ng 1 Commer erce S ce Stree reet Montg

Chapter 5 Brief history of climate: causes and mechanisms Climate system dynamics and modelling

Characteristics During the First Part of the 21 st Century Anthony R. Lupo 1 , Andrew D. Jensen 2 ,

Applying EMAN2 to Difficult Problems http:/ /blake.bcm.edu Steve Ludtke Co-director NCMI

Office of Science Perspective Symposium on Accelerators for Americas Future October 26, 2009

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Satya S.

RESILIENCE GOD STYLE: Weathering the Storms of Life Teacher, do You not care that we are