Large Sample Robustness Bayes Nets with Incomplete Information Jim - PowerPoint PPT Presentation

Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah Universities of Warwick and Strathclyde Denmark PGM September 2008 Denmark PGM September 2008 1 / Jim Smith (Warwick) Robust Bayes Nets 30

Motivation We often worry about convergence of samplers etc. in a Bayesian analysis. How precise does the the prior on a BN have to be? In particular what is the overall e¤ect of local and global independence assumptions on a given model? What are the overall inferential implications of using standard priors like product Dirichlets or product logistics? In general how hard do I need to think about these issues a priori when I know I will collect a large sample? Denmark PGM September 2008 2 / Jim Smith (Warwick) Robust Bayes Nets 30

Messy Analyses Large BN - some expert knowledge incorporated. Nodes in our graph are systematically missing/ sample not random. Possible unidenti…ablity even taking account of aliasing as n ! ∞ θ 2 θ 3 θ 4 θ 5 # # # # θ 1 � � � � θ 6 # " % " % . � � � � � ! � ! � % % % # - θ 7 θ 8 θ 9 � θ 11 " θ 10 Denmark PGM September 2008 3 / Jim Smith (Warwick) Robust Bayes Nets 30

The Problems For a given prior only a numerical or algebraic approximation of posterior density. Just have approximate summary statistics (e.g. means, variances, sampled low dimensional margins, ...) Robustness issues: even for complete sampling. Variation distance d V ( f , g ) = R j f � g j between two posteriors can diverge quickly as sample size increases , especially when the parameter space is large with outliers (Dawid, 1973) and more generally (Gustafson and Wasserman,1995). So when and how are posterior inferences strongly in‡uenced by prior? Local De Robertis separations the key to addressing this issue! Denmark PGM September 2008 4 / Jim Smith (Warwick) Robust Bayes Nets 30

About LDR Local De Robertis (LDR) separations are easy to calculate and extend natural parametrizations in exponential families. Have an intriguing prior to posterior invariance property. BN factorization of a density implies linear relationships between clique marginal separations and joint. Bounds on the variation distance between two posterior distributions associated with di¤erent priors calculated explicitly as a function of prior LDR bounds and posterior statistics associated with the functioning prior. Bounds apply posterior to an observed likelihood, even when the sample density is misspeci…ed . Denmark PGM September 2008 5 / Jim Smith (Warwick) Robust Bayes Nets 30

Contents De Robertis local Separations Some Properties of Local De Robertis Separations Some useful Theorems concerning LDR and BNs. What this means for the robustness of BN’s Denmark PGM September 2008 6 / Jim Smith (Warwick) Robust Bayes Nets 30

The Setting Let g 0 , ( g n ) our genuine prior (posterior) density : f 0 , ( f n ) our functioning prior (posterior) density Default for Bayes f 0 often products of Dirichlets x n = ( x 1 , x 2 , . . . x n ) , n � 1. with observed sample densities f p n ( x n j θ ) g n � 1 , With missing data, typically these sample densities are typically f p n ( x n j θ ) g n � 1 (and hence f n and g n ) intractable f n therefore approximated either by drawing samples or algebraically. Denmark PGM September 2008 7 / Jim Smith (Warwick) Robust Bayes Nets 30

A Bayes Rule Identity Let Θ ( n ) = f θ 2 Θ : p ( x n j θ ) > 0 g For all θ 2 Θ ( n ) then log g n ( θ ) = log g 0 ( θ ) + log p n ( x n j θ ) � log p g ( x n ) log f n ( θ ) = log f 0 ( θ ) + log p n ( x n j θ ) � log p f ( x n ) where p g ( x n ) = R θ 2 Θ ( n ) p ( x n j θ ) g 0 ( θ ) d θ , p f ( x n ) = R θ 2 Θ ( n ) p ( x n j θ ) f 0 ( θ ) d θ , (When θ 2 Θ n Θ ( n ) set g n ( θ ) = f n ( θ ) = 0) So log f n ( θ ) � log g n ( θ ) = log f 0 ( θ ) � log g 0 ( θ ) + log p g ( x n ) � log p f ( x n ) Denmark PGM September 2008 8 / Jim Smith (Warwick) Robust Bayes Nets 30

From Bayes Rule to LDR For any subset A � Θ ( n ) let A ( f , g ) , sup d L ( log f ( θ ) � log g ( θ )) � inf φ 2 A ( log f ( φ ) � log g ( φ )) θ 2 A Then since log f n ( θ ) � log g n ( θ ) = log f 0 ( θ ) � log g 0 ( θ ) + log p g ( x n ) � log p f ( x n ) for any sequence f p ( x n j θ ) g n � 1 - however complicated - d L A ( f n , g n ) = d L A ( f 0 , g 0 ) Denmark PGM September 2008 9 / Jim Smith (Warwick) Robust Bayes Nets 30

Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Trivially posterior distances between densities can be calculated e¤ortlessly from priors. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Trivially posterior distances between densities can be calculated e¤ortlessly from priors. Separation of two priors lying in standard families can usually be expressed explicitly and always explicitly bounded. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

Some notation We will be especially interested in small sets A . Let B ( µ ; ρ ) denote the open ball centred at µ = ( µ 1 , µ 2 , . . . , µ k ) and of radius ρ Let µ ; ρ ( f , g ) , d L d L B ( µ ; ρ ) ( f , g ) For any subset Θ 0 � Θ , let d L d L Θ 0 ; ρ ( f , g ) = sup µ ; ρ ( f , g ) µ 2 Θ 0 Obviously for any A � B ( µ ; ρ ) , µ 2 Θ 0 � Θ , d L A ( f , g ) � d L Θ 0 ; ρ ( f , g ) Denmark PGM September 2008 11 / Jim Smith (Warwick) Robust Bayes Nets 30

Separation of two Dirichlets . Let θ = ( θ 1 , θ 2 , . . . , θ k ) α = ( α 1 , α 2 , . . . , α k ) , θ i , α i > 0 , ∑ k i = 1 θ i = 1 Let f 0 ( θ j α f ) and g 0 ( θ j α g ) be Dirichlet( α ) so that k k θ α i , f � 1 θ α i , g � 1 f 0 ( θ j α f ) _ g 0 ( θ j α g ) _ ∏ ∏ , i i i = 1 i = 1 Let µ n = ( µ 1 , n , µ 2 , n , . . . , µ k , n ) be the mean of f n If ρ n < µ 0 n = min f µ n : 1 � i � k g � � � 1 α ( f 0 , g 0 ) d L µ 0 µ ; ρ n ( f 0 , g 0 ) � 2 k ρ n n � ρ n where k α ( f 0 , g 0 ) = k � 1 ∑ j α i , f � α i , g j i = 1 is the average distance between hyperparameters of f 0 and g 0 . Denmark PGM September 2008 12 / Jim Smith (Warwick) Robust Bayes Nets 30

Where Separations might be large � � � 1 k d L µ 0 ∑ µ ; ρ n ( f 0 , g 0 ) � 2 ρ n n � ρ n j α i , f � α i , g j i = 1 So d L µ ; ρ n ( f 0 , g 0 ) is uniformly bounded whenever µ n all away from 0 and converging approximately linearly in n . OTOH if f n tends to mass near a zero probability, then even when α ( f , g ) is small, it can be shown that at least some likelihoods will force the variation distance between the posterior densities to stay large for increasing n : Smith(2007). The smaller the smallest probability tended to the slower any convergence. Denmark PGM September 2008 13 / Jim Smith (Warwick) Robust Bayes Nets 30

BN’s with local and global independence If functioning prior f ( θ ) and genuine prior g ( θ ) factorize on subvectors f θ 1 , θ 2 , . . . θ k g so that k k ∏ ∏ f ( θ ) = f i ( θ i ) , g ( θ ) = g i ( θ i ) i = 1 i = 1 where f i ( θ i ) ( g i ( θ i ) ) are the functioning (genuine) margin on θ i , 1 � i � k , then (like K-L separations) k d L d L ∑ A ( f , g ) = A i ( f i , g i ) i = 1 So local prior distances grow linearly with no. of de…ning conditional probability vectors. Denmark PGM September 2008 14 / Jim Smith (Warwick) Robust Bayes Nets 30

Some conclusions BN’s with larger nos of edges intrinsically less stable However - like K-L - marginal densities are never more separated than their joint densities - so if a utility is only on a particular margin then these distances may be much less. Bayes Factors automatically select simpler models but note also inferences of a more complex model tends to be more sensitive to wrongly speci…ed priors. Denmark PGM September 2008 15 / Jim Smith (Warwick) Robust Bayes Nets 30

Large Sample Robustness Bayes Nets with Incomplete Information Jim - PowerPoint PPT Presentation

Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah Universities of Warwick and Strathclyde Denmark PGM September 2008 Denmark PGM September 2008 1 / Jim Smith (Warwick) Robust Bayes Nets 30

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Bayes Nets 10-701 recitation 04-02-2013 Bayes Nets Represent dependencies between variables

Learning in Bayes Nets Bayes Nets: 1. Parameter Learning/Estimation: infer from data, given G

Outline Inference in Bayes Nets Variable Elimination Bayes Nets (cont) CS 486/686

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Why Are Convlotuional Nets More Sample-Efficient than Fully-Connected Nets? Zhiyuan Li Joint

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

Learning in Graphical Models Problem Dimensions Model Bayes Nets Markov Nets

Incomplete Information Econ 400 University of Notre Dame Econ 400 (ND) Incomplete Information

1 Bayes Nets: Assumptions Independence in a BN Assumptions we are required to make to define

UCSD Robustness Summer School David Donoho 20190812 David Donoho UCSD Robustness Summer School

Robustness? Robustness ? Robustness?

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

FEASIBLE JOINT POSTERIOR BELIEFS BAYESIAN COMMUNICATION N Receivers: POSTERIOR s 1 S 1 p

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 25: Introduction to

Introduction to Bayesian models with Stata Ernesto F. L. Amaral Katherine A. C. Willyard May

Some DIC slides David Spiegelhalter MRC Biostatistics Unit, Cambridge with thanks to: Nicky

technique: assessing anthropogenic emissions of CO,NOx and CO2 and their impacts. J. Brioude

Semantic Foundations for Probabilistic Programming Chris Heunen Ohad Kammar, Sam Staton, Frank

Data Asymptotics Dr. Jarad Niemi STAT 544 - Iowa State University February 7, 2018 Jarad Niemi

Thompson Sampling and Linear Bandits Instructor: Sham Kakade 1 Review The basic paradigm is as