Comparing Bayesian Networks and Structure Learning Algorithms (and - PowerPoint PPT Presentation

Comparing Bayesian Networks and Structure Learning Algorithms (and other graphical models) Marco Scutari marco.scutari@stat.unipd.it Department of Statistical Sciences University of Padova October 20, 2009 Marco Scutari University of Padova

Introduction Marco Scutari University of Padova

Introduction Graphical models Graphical models are defined by the combination of: • a network structure, either an undirected (Markov networks [2], gene association networks, correlation networks, etc.) or a directed graph (Bayesian networks [7]). Each node corresponds to a random variable. • a global probability distribution which can be factorized into a set of local probability distributions (one for each node) according to the topology of the graph. This allows a compact representation of the joint distribution of large numbers of random variables and simplifies inference on their parameters. Marco Scutari University of Padova

Introduction A simple Bayesian network: Watson’s lawn SPRINKLER SPRINKLER SPRINKLER RAIN RAIN SPRINKLER TRUE FALSE RAIN TRUE FALSE 0.2 0.8 GRASS WET FALSE 0.4 0.6 TRUE 0.01 0.99 GRASS WET SPRINKLER RAIN TRUE FALSE FALSE FALSE 0.0 1.0 FALSE TRUE 0.8 0.2 TRUE FALSE 0.9 0.1 TRUE TRUE 0.99 0.01 Marco Scutari University of Padova

Introduction The problem Almost all literature on graphical models focuses on the study of the parameters of the local probability distributions (such as conditional probabilities or partial linear correlations). • this makes comparing models learned with different algorithms difficult, because they maximize different scores, use different estimators for the parameters, work under different sets of hypotheses, etc. • unless the true global probability distribution is known it’s difficult to assess the quality of the estimated models. • the few measures of structural difference are completely descriptive in nature (i.e. Hamming distance [6] or SHD [10]), and have no easy interpretation. Marco Scutari University of Padova

Modeling undirected network structures Marco Scutari University of Padova

Modeling undirected network structures Edges and univariate Bernoulli random variables Each edge e i in an undirected graph U = ( V , E ) has only two possible states, � 1 if e i ∈ E e i = otherwise . 0 Therefore it can be modeled as a Bernoulli random variable E i : � 1 e i ∈ E with probability p i e i ∼ E i = 0 e i �∈ E with probability 1 − p i where p i is the probability that the edge e i belongs to the graph. Let’s denote it as e i ∼ Ber ( p i ) . Marco Scutari University of Padova

Modeling undirected network structures Edge sets as multivariate Bernoulli The natural extension of this approach is to model any set W of edges (such as E or { V × V } ) as a multivariate Bernoulli random variable W ∼ Ber k ( p ) . It is uniquely identified by the parameter set p = { p w : w ⊆ W, w � = ∅ } , which represents the dependence structure [8] among the marginal distributions W i ∼ Ber ( p i ) , i = 1 , . . . , k of the edges. Marco Scutari University of Padova

Modeling undirected network structures Estimation of the parameters of W The parameter set p of W can be estimated via bootstrap [3] as in Friedman et al. [4] or Imoto et al. [5]: 1. For b = 1 , 2 , . . . , m 1.1 re-sample a new data set D ∗ b from the original data D using either parametric or nonparametric bootstrap. 1.2 learn a graphical model U b = ( V , E b ) from D ∗ b . 2. Estimate the probability of each subset w of W as m p w = 1 � ˆ I { w ⊆ E b } ( U b ) . m b =1 Marco Scutari University of Padova

Properties of the multivariate Bernoulli distribution Marco Scutari University of Padova

Properties of the multivariate Bernoulli distribution Moments The first two moments of a multivariate Bernoulli variable W = [ W 1 , W 2 , . . . , W k ] are P = [ E ( W 1 ) , . . . , E ( W k )] T Σ = [ σ ij ] = [ COV ( W i , W j )] where E ( W i ) = p i COV ( W i , W j ) = E ( W i W j ) − E ( W i ) E ( W j ) = p ij − p i p j VAR ( W i ) = COV ( W i , W i ) = p i − p 2 i and can be estimated using m m p i = 1 p ij = 1 � � ˆ I { e i ∈ E b } ( U b ) and ˆ I { e i ∈ E b ,e j ∈ E b } ( U b ) . m m b =1 b =1 Marco Scutari University of Padova

Properties of the multivariate Bernoulli distribution Uncorrelation and independence Theorem Let B i and B j be two Bernoulli random variables. Then B i and B j are independent if and only if their covariance is zero: B i ⊥ ⊥ B j ⇐ ⇒ COV ( B i , B j ) = 0 Theorem Let B = [ B 1 , B 2 , . . . , B k ] T and C = [ C 1 , C 2 , . . . , C l ] T , k, l ∈ N be two multivariate Bernoulli random variables. Then B and C are independent if and only if B ⊥ ⊥ C ⇐ ⇒ COV ( B , C ) = O where O is the zero matrix. Marco Scutari University of Padova

Properties of the multivariate Bernoulli distribution Uncorrelation and independence (an example) Let B = [ B 1 B 2 B 3 ] T = B 1 + B 2 ; then we have         0 0  − E  E  � B 1 � �� B 1 �� 0 0 COV ( B 1 , B 2 ) = E B 2 B 3 B 2 B 3      0 0       0 0 0 0  −  � p 1 � = E 0 0 B 1 B 2 B 2 B 3 p 2 p 3     0 0 0 0     0 0 0 0 0 0  −  = = 0 0 p 12 p 23 p 1 p 2 p 2 p 3   0 0 0 0 0 0   0 0 0  = O ⇔ B 1 ⊥ = p 12 − p 1 p 2 0 p 23 − p 2 p 3 ⊥ B 2  0 0 0 Marco Scutari University of Padova

Properties of the multivariate Bernoulli distribution Constraints on the covariance matrix Σ The marginal variances of the edges are bounded, because � 0 , 1 � ⇒ σ ii = p i − p 2 p i ∈ [0 , 1] = i ∈ . 4 The maximum is attained for p i = 1 2 , and the minimum for both p i = 0 and p i = 1 . For the Cauchy-Schwartz theorem [1] then covariances are bounded too: ij � σ ii σ jj � 1 � 0 , 1 � 0 � σ 2 16 = ⇒ | σ ij | ∈ . 4 These result in similar bounds on the eigenvalues λ 1 , . . . , λ k of Σ , k 0 � λ i � k λ i � k � and 0 � 4 . 4 i =1 Marco Scutari University of Padova

Properties of the multivariate Bernoulli distribution Constraints on Σ : a graphical representation � 6 � � 0 . 24 � Σ 1 = 1 1 0 . 04 = 25 1 6 0 . 04 0 . 24 � 66 � 1 − 21 Σ 2 = − 21 126 625 � 0 . 1056 � − 0 . 0336 = − 0 . 0336 0 . 2016 � 66 � 1 91 Σ 3 = 91 126 625 � 0 . 1056 0 . 1456 � = 0 . 1456 0 . 2016 Marco Scutari University of Padova

Measures of Structure Variability Marco Scutari University of Padova

Measures of Structure Variability Entropy of the bootstrapped models Let’s consider the graphical models U 1 , . . . , U m learned from the bootstrap samples. • minimum entropy: all the models learned from the bootstrap samples have the same structure. In this case: � 1 if e i ∈ E p i = and Σ = O . 0 otherwise • intermediate entropy: several models are observed with different frequencies m b , � m b = m , so p i = 1 p ij = 1 � � ˆ m b and ˆ m b . m m b : e i ∈ E b b : e i ∈ E b ,e j ∈ E b • maximum entropy: all possible models appear with the same frequency, which results in p i = 1 Σ = 1 and 4 I k . 2 Marco Scutari University of Padova

Measures of Structure Variability Entropy of the bootstrapped models maximum entropy minimum entropy Marco Scutari University of Padova

Measures of Structure Variability Univariate measures of variability • the generalized variance k � � 0 , 1 � VAR G (Σ) = det(Σ) = λ i ∈ 4 k i =1 • the total variance k � � 0 , k � VAR T (Σ) = tr (Σ) = λ i ∈ 4 i =1 • the squared Frobenius matrix norm k � 2 � � k ( k − 1) 2 , k 3 � VAR N (Σ) = ||| Σ − k λ i − k � 4 I k ||| 2 F = ∈ 4 16 16 i =1 Marco Scutari University of Padova

Measures of Structure Variability Measures of structure variability max Σ VAR T (Σ) = 4 VAR T (Σ) VAR T (Σ) = k VAR T (Σ) VAR G (Σ) max Σ VAR G (Σ) = 4 k VAR G (Σ) VAR G (Σ) = max Σ VAR N (Σ) − VAR N (Σ) VAR N (Σ) = max Σ VAR N (Σ) − min Σ VAR N (Σ) = k 3 − 16 VAR N (Σ) k (2 k − 1) All of them vary in the [0 , 1] interval and associate high values to networks whose structure display a high entropy in the bootstrap samples. Marco Scutari University of Padova

Measures of Structure Variability Structure variability (total variance) maximum entropy minimum entropy Marco Scutari University of Padova

Measures of Structure Variability Structure variability (Frobenius norm) maximum entropy minimum entropy Marco Scutari University of Padova

Measures of Structure Variability Applications • compare the performance of different combinations of learning algorithms and network scores/independence tests on the same data. • study the performance of an algorithm at different sample sizes by changing the size bootstrap samples. The simplest way is to test the hypothesis H 0 : Σ = 1 H 1 : Σ � = 1 4 I k 4 I k using either parametric tests or parametric bootstrap. • apply many techniques from classical multivariate statistics (such as principal components), graph theory (path analysis) and linear algebra (matrix decompositions). Marco Scutari University of Padova

Comparing Bayesian Networks and Structure Learning Algorithms (and - PowerPoint PPT Presentation

Comparing Bayesian Networks and Structure Learning Algorithms (and other graphical models) Marco Scutari marco.scutari@stat.unipd.it Department of Statistical Sciences University of Padova October 20, 2009 Marco Scutari University of Padova

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Learning Bayesian Networks: Learning Bayesian Networks: Na ve and non ve and non- -Na

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

On the number of polynomial solutions of Bernoulli and Abel polynomial differential equations

Rates for Inductive Learning of Compositional Models Adrian Barbu Department of Statistics

W -superrigidity for Bernoulli actions and wreath product group von Neumann algebras Lecture

Ergodicity and type of nonsingular Bernoulli actions Richard Kadison and his mathematical legacy

Critical Parameters of Loop and Bernoulli Percolation Peter M uhlbacher University of Warwick

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 8

Pseudorandom fu functions in in alm lmost constant depth fr from lo low-noise LPN Yu Yu