Informative Priors for Graphical Model Structure James Cussens, - PowerPoint PPT Presentation

Informative Priors for Graphical Model Structure James Cussens, University of York jc@cs.york.ac.uk (joint work with Nicos Angelopoulos) Supported by the UK EPSRC MATHFIT programme

Use of structural priors • “The use of structural priors when learning BNs has received only little attention in the learning community.” (Langseth & Nielsen, 2003) • “The standard priors over network structures are often used not because they are particularly well-motivated, but rather because they are simple and easy to work with. In fact, the ubiquitous uniform prior over structures is far from uniform over [Markov equivalence classes]” (Friedman & Koller, 2003) Bristol 17/10/03 1

Exploiting experts “. . . in the context of knowledge-based systems, or indeed in any context where the primary aim of the modeling effort is to predict the future, [uniform] prior distributions are often inappropriate; one of the primary advantages of the Bayesian approach is that it provides a practical framework for harnessing all available re- sources including prior expert knowledge.” (Madigan et al, 1995) Bristol 17/10/03 2

The problem with experts “Notwithstanding the preceding remarks, eliciting an informative prior distribution on model space from a domain expert is challenging.” (Madigan et al, 1995) Bristol 17/10/03 3

Hard constraints • Imposing a total ordering on variables or blocks • Limiting the number of parents • Banning/requiring specific edges. Bristol 17/10/03 4

Assuming link independence � � pr( M ) ∝ pr( e ) (1 − pr( e )) e ∈E P e ∈E A (Buntine, 1991; Cooper & Herskovits, 1992; Madigan and Raftery, 1994) Bristol 17/10/03 5

Edit distance from prior network Let M differ from the expert’s prior network by δ arcs, then pr( M ) = cκ δ ( ≈ Madigan and Raftery, 1994; Heckerman et al, 1995) Bristol 17/10/03 6

Priors over CART trees • A Bayesian CART algorithm. Denison et al, Biometrika 1998 • Bayesian CART model search. Chipman et al, JASA 1998 “Instead of specifying a closed-form expression for the tree prior, p ( T ), we specify p ( T ) implicitly by a tree-generating stochastic process. Each realization of such a process can simply be con- sidered a random draw from this prior. Furthermore, many spec- ifications allow for straightforward evaluation of p ( T ) for any T and can be effectively coupled with efficient Metropolis-Hastings search algorithms . . . ” (Denison et al) Bristol 17/10/03 7

A graphical-model-generating stochastic process 40 B p1 38 A A B 37 p1 p1 p3 p2 A B A A B A B A B 28 29 30 31 32 C C C C p1 p1 p3 p1 p3 p1 p1 p2 p3 p3 p2 p2 p2 A B A B A B A B A A B A B A B A B A B A B A B A B A B X C C C C C C X C C C C C C C 1 2 3 4 5 6 7 8 9 10 11 12 13 Bristol 17/10/03 8

Stochastic logic programs implement model-generating stochastic processes 1. Write a logic program which defines a set of models: • BN is a Bayesian network if . . . • ∀ BN : bn ( BN ) ← digraph ( BN ) ∧ acyclic ( BN ) • bn(BN) :- digraph(BN), acyclic(BN). 2. Add parameters to define distribution over models to get a stochastic logic program (SLP). Bristol 17/10/03 9

SLPs for MCMC • The tree gives a natural neighbourhood structure to the model space . . . • . . . which we exploit to construct a proposal distribution based on the prior. Bristol 17/10/03 10

The proposal mechanism 1. Backtrack one step to the most recent choice point in the probability tree 2. We then probabilistically backtrack as follows: If at the top of the tree stop. Otherwise backtrack one more step to the next choice point with probability p b . 3. Once we have stopped backtracking choose a new leaf/model M ∗ from the choice point by selecting branches according to their probabilities attached to them. However, in the first step down the tree we may not choose the branch that leads back to the current leaf/model M i . Bristol 17/10/03 11

Bouncing around the tree G_0 fail not a choice point G ........ pi p* ....... ni = n* =2 M* ...... Mi Bristol 17/10/03 12

The acceptance probability If M ∗ is a failure then α ( M i , M ∗ ) = 0 else: 1 − p i P ( D | M ∗ ) � � p ( n ∗ − n i ) α ( M i , M ∗ ) = min P ( D | M i ) , 1 b 1 − p ∗ Bristol 17/10/03 13

Better mixing with a cyclic transition kernel • We cycle through the values p b = 1 − 2 − n , for n = 1 , . . . , 28, • so that on every 28th iteration, there is a high probability of backtracking all the way to the top of the tree. Bristol 17/10/03 14

It works . . . eventually! ˆ ˆ ˆ M p 4 p 5 p 6 p 0.668 0.690 0.704 0.702 BN 22 0.176 0.150 0.145 0.146 BN 20 0.144 0.152 0.143 0.145 BN 19 0.007 0.005 0.005 0.005 BN 4 0.002 0.001 0.002 0.002 BN 5 0.001 0.001 0.001 0.001 BN 1 0 0 0 0 BN 14 0.001 0 0 0 BN 10 0 0 0 0 BN 11 Estimated (ˆ p i ) and actual ( p ) posterior probabilities for the nine most probable 3-node BNs in BNTREE . p i is the estimated probability after 10 i iterations. ˆ Bristol 17/10/03 15

Real evaluation • Generate 2295 datapoints from the Asia BN • 783,702,329,343 BNs in model space • Run MCMC for 500,000 iterations (no burn-in) • Runtimes: 24 minutes - 55 minutes • 2 runs for each ‘setting’: compare observed probabilities Bristol 17/10/03 16

Real evaluation - OK results 1 3pun 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bristol 17/10/03 17

Real evaluation - hmmm 1 8pun 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bristol 17/10/03 18

Markov equivalence classes bn(RVs,BN) :- skeleton(RVs,Skel), essential_graph(Skel,Imms,EG), %could stop here bn(EG,Imms,BN), top_sort(BN,_). %check for cycles Way too many failures! Bristol 17/10/03 19

Logic program transformation member(X,[X|_]). member(X,[_|T]) :- member(X,T). member(X,[X,_,_]). ?- member(X,List),List=[_,_,_] member(X,[_,X,_]). member(X,[_,_,X]). Bristol 17/10/03 20

SLP transformation for more efficient sampling 1/2 : member(X,[X|_]). 1/2 : member(X,[_|T]) :- member(X,T). 4/7 : member(X,[X,_,_]). ?- member(X,List),List=[_,_,_] 2/7 : member(X,[_,X,_]). 1/7 : member(X,[_,_,X]). Bristol 17/10/03 21

What about R? • R calls C calls Prolog • Where does the prior live? as an R object? • The data should eventually be an R dataframe • Begin with R as a ‘wrapper’. Bristol 17/10/03 22

Informative Priors for Graphical Model Structure James Cussens, - PowerPoint PPT Presentation

Informative Priors for Graphical Model Structure James Cussens, University of York jc@cs.york.ac.uk (joint work with Nicos Angelopoulos) Supported by the UK EPSRC MATHFIT programme Use of structural priors The use of structural priors

everything is fine informative non-significant findings from a large informative non-significant

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Cancer Classification Using Cancer Classification Using Informative Gene Profiles Informative

INFORMATIVE PRESENTATION Mr. Winn / Communication Arts OVERVIEW An informative speech provides

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

Influences of an interdisciplinary global health program on cultural awareness and future global

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Universit

Influence of Array Storage and Access Methods on Performance of Multi-Dimensional Arrays Used in

barnacle Chthalamus stellatus . Lotka Volterra Competition Alfred J. Lotka (1880 1949),

On the Prior and Posterior Distributions Used in Graphical Modelling Marco Scutari

Comparing two proportions Beginning Bayes in R Learning about many parameters Chapters 2-3

Design Space for analytical methods A Bayesian perspective based on multivariate models and

Univariate point-level modeling Basic Model: Y ( s ) = x T ( s ) + w ( s ) + ( s ) The

Informative Priors for Graphical Model Structure James Cussens, - PowerPoint PPT Presentation

Informative Priors for Graphical Model Structure James Cussens, University of York jc@cs.york.ac.uk (joint work with Nicos Angelopoulos) Supported by the UK EPSRC MATHFIT programme Use of structural priors The use of structural priors

everything is fine informative non-significant findings from a large informative non-significant

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Cancer Classification Using Cancer Classification Using Informative Gene Profiles Informative

INFORMATIVE PRESENTATION Mr. Winn / Communication Arts OVERVIEW An informative speech provides

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Conjugate Priors: Beta and Normal 18.05 Spring 2018 Review: Continuous priors, discrete data

P-values, Probability, Priors, Rabbits, P-values, Probability, Priors, Rabbits, Quantifauxcation,

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

Influences of an interdisciplinary global health program on cultural awareness and future global

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Universit

Influence of Array Storage and Access Methods on Performance of Multi-Dimensional Arrays Used in

barnacle Chthalamus stellatus . Lotka Volterra Competition Alfred J. Lotka (1880 1949),

On the Prior and Posterior Distributions Used in Graphical Modelling Marco Scutari

Comparing two proportions Beginning Bayes in R Learning about many parameters Chapters 2-3

Design Space for analytical methods A Bayesian perspective based on multivariate models and

Univariate point-level modeling Basic Model: Y ( s ) = x T ( s ) + w ( s ) + ( s ) The

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical