Introduction to Bayesian Statistics Lecture 7: Multiparameter models - PowerPoint PPT Presentation

Introduction to Bayesian Statistics Lecture 7: Multiparameter models (III) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University April 15, 2015

Multiparameter model: the multinomial model • y = ( y 1 , · · · , y J ) ∼ multinomial( n ; θ 1 , · · · , θ J ) with � J j =1 y j = n , use Bayesian approach to estimate θ = ( θ 1 , · · · , θ J ). i.e., ◦ Likelihood: J � y j p ( y | θ ) ∝ θ j j =1 ◦ Prior of θ : choose the conjugate prior of a Dirichlet distribution, Dirichlet( α 1 , · · · , α J ), for θ : J J α j − 1 � � p ( θ | α ) ∝ θ with θ j = 1 . j j =1 j =1 where Dirichlet is a multivariate generalization of the beta distribution. ◦ Posterior of θ J α j + y j − 1 � p ( θ | y ) = p ( θ ) p ( y | θ ) ∝ θ , i . e ., θ | y ∼ Dirichlet ( α 1 + y 1 , · · · , α J + y J ) j j =1 2 of 13

Multiparameter model: the multivariate normal model iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), Σ known, use Bayesian approach to estimate µ . ◦ choose a conjugate prior for µ , µ ∼ MVN( µ 0 , Λ 0 ) � − 1 � p ( µ ) ∝ | Λ 0 | − 1 / 2 exp 2( µ − µ 0 ) T Λ − 1 0 ( µ − µ 0 ) ◦ likelihood of µ : � n � − 1 � | Σ | − n / 2 exp ( y i − µ ) T Σ − 1 ( y i − µ ) p ( y 1 , · · · , y n | µ , Σ ) ∝ 2 i =1 � − 1 � | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) = where S 0 = � n i =1 ( y i − µ )( y i − µ ) T 3 of 13

Multiparameter model: multivariate normal, Σ known iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), Σ known, use Bayesian approach to estimate µ . ◦ find the posterior distribution of µ : p ( µ | y 1 , · · · , y n , Σ ) ∝ p ( µ ) p ( y 1 , · · · , y n | µ ) | Σ | − n / 2 exp( − 1 2[( µ − µ 0 ) T Λ − 1 ∝ 0 ( µ − µ 0 ) n � ( y i − µ ) T Σ − 1 ( y i − µ )]) + i =1 � − 1 � 2( µ − µ n ) T Λ − 1 ∝ exp n ( µ − µ n ) that is, p ( µ | y 1 , · · · , y n , Σ ) ∼ MVN( µ n , Λ n ), where µ n = ( Λ − 1 + n Σ − 1 ) − 1 ( Λ − 1 0 µ 0 + n Σ − 1 ¯ y ) and Λ − 1 = Λ − 1 + n Σ − 1 0 n 0 4 of 13

Multiparameter model: multivariate normal, Σ known • p ( µ | y 1 , · · · , y n , Σ ) ∼ MVN( µ n , Λ n ), where µ n = ( Λ − 1 + n Σ − 1 ) − 1 ( Λ − 1 0 µ 0 + n Σ − 1 ¯ y ) and Λ − 1 = Λ − 1 + n Σ − 1 0 n 0 � µ (1) � � � � µ (1) Λ (11) Λ (12) � n n n • Let µ = , µ n = and Λ n = . µ (2) µ (2) Λ (21) Λ (22) n n n ◦ posterior marginal distribution of subvectors of µ : n , Λ (11) p ( µ (1) | y 1 , · · · , y n , Σ ) ∼ MVN( µ (1) ) n ◦ posterior conditional distribution of subvectors of µ : + β 1 | 2 ( µ (2) − µ (2) p ( µ (1) | µ (2) , y 1 , · · · , y n , Σ ) ∼ MVN( µ (1) n ) , Λ 1 | 2 ) n where β 1 | 2 = Λ (12) ) − 1 , and Λ 1 | 2 = Λ (11) ( Λ (22) − Λ (12) ( Λ (22) ) − 1 Λ (21) . n n n n n n 5 of 13

Multiparameter model: multivariate normal, Σ known iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), Σ known, use Bayesian approach to estimate µ . ◦ prior for µ : choose a non-informative prior, p ( µ ) ∼ 1 ◦ likelihood of µ : � n � − 1 � ( y i − µ ) T Σ − 1 ( y i − µ ) | Σ | − n / 2 exp p ( y 1 , · · · , y n | µ , Σ ) ∝ 2 i =1 � − 1 � | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) = where S 0 = � n i =1 ( y i − µ )( y i − µ ) T ◦ posterior for µ : p ( µ | y 1 , · · · , y n , Σ ) ∝ p ( µ ) p ( y 1 , · · · , y n | µ , Σ ) ∝ p ( y 1 , · · · , y n | µ , Σ ) , i.e., y , Σ µ | Σ , y 1 , · · · , y n ∼ MVN(¯ n ) . 7 of 13

Multivariate normal model, Σ unknown iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), both µ and Σ known, use Bayesian approach to estimate µ . ◦ take a conjugate prior for ( µ , Σ ): p ( µ , Σ ) = p ( Σ ) p ( µ | Σ ) Inv-Wishart ν 0 ( Λ − 1 Σ ∼ 0 ) µ | Σ ∼ MVN( µ 0 , Σ /κ 0 ) i.e., the joint prior density p ( µ , Σ ) � − 1 � 2tr( Λ 0 Σ − 1 ) − κ 0 p ( µ , Σ ) ∝ | Σ | − (( ν 0 + d ) / 2+1) exp 2 ( µ − µ 0 ) T Σ − 1 ( µ − µ 0 ) . We label this the N-Inverse-Wishart( µ 0 , Λ 0 /κ 0 ; ν 0 , Λ 0 ) ◦ likelihood: � � − 1 p ( y 1 , · · · , y n | µ , Σ ) ∝ | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) where S 0 = � n i =1 ( y i − µ )( y i − µ ) T 8 of 13

Joint posterior distribution, p ( µ , Σ | y 1 , · · · , y n ) iid • y 1 , · · · , y n ∼ MVN( µ , Σ ) ◦ prior of ( µ , Σ ): µ , Σ ∼ N-Inverse-Wishart( µ 0 , Λ 0 /κ 0 ; ν 0 , Λ 0 ) ◦ the joint posterior distribution of ( µ , Σ ): p ( µ , Σ | y 1 , · · · , y n ) ∝ p ( µ , Σ ) p ( y 1 , · · · , y n | µ , Σ ) � − 1 2tr( Λ 0 Σ − 1 ) − κ 0 � | Σ | − ( ν 0+ d ) +1 exp 2 ( µ − µ 0 ) T Σ − 1 ( µ − µ 0 ) ∝ 2 � − 1 � | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) × = N-Inv-Wishart( µ n , Λ n /κ n ; ν n , Λ n ) . (1) where κ 0 n • µ n = κ 0 + n µ 0 + κ 0 + n ¯ y • κ n = κ 0 + n • ν n = ν 0 + n y − µ 0 ) T with S = � n κ 0 n y ) T • Λ n = Λ 0 + S + κ 0 + n (¯ y − µ 0 )(¯ i =1 ( y i − ¯ y )( y i − ¯ 9 of 13

Conditional posterior distribution, p ( µ | Σ , y 1 , · · · , y n ) • p ( µ , Σ | y 1 , · · · , y n ) = p ( µ | Σ , y 1 , · · · , y n ) p ( Σ | y 1 , · · · , y n ) • the conditional posterior density of µ given Σ is proportional to the joint posterior density (1) with Σ held constant, µ | Σ , y 1 , · · · , y n ∼ MVN( µ n , Σ ) κ n 10 of 13

Marginal posterior distribution, p ( Σ | y 1 , · · · , y n ) • p ( µ , Σ | y 1 , · · · , y n ) = p ( µ | Σ , y 1 , · · · , y n ) p ( Σ | y 1 , · · · , y n ) • p ( Σ | y 1 , · · · , y n ) requires averaging the joint distribution p ( µ , Σ | y 1 , · · · , y n ) over µ , as a result, we have Σ | y 1 , · · · , y n ∼ Inv-Wishart ν n ( Λ − 1 n ) y − µ 0 ) T with κ 0 n where Λ n = Λ 0 + S + κ 0 + n (¯ y − µ 0 )(¯ S = � n y ) T i =1 ( y i − ¯ y )( y i − ¯ 11 of 13

Marginal posterior distribution of µ , p ( µ | y 1 , · · · , y n ) • Estimand of interest: µ • To obtain the marginal posterior distribution of µ : ◦ our results from the univariate normal is generalized to the multivariate case: µ | y 1 , · · · , y n ∼ t ν n − d +1 ( µ n , Λ n / ( κ n ( ν n − d + 1))) where κ 0 n κ 0 + n ¯ • µ n = κ 0 + n µ 0 + y • κ n = κ 0 + n , ν n = ν 0 + n y − µ 0 ) T with S = � n κ 0 n y ) T • Λ n = Λ 0 + S + κ 0 + n (¯ y − µ 0 )(¯ i =1 ( y i − ¯ y )( y i − ¯ ◦ By simulation: • first draw Σ from p ( Σ | y 1 , · · · , y n ) with Σ | y 1 , · · · , y n ∼ Inv-Wishart ν n ( Λ − 1 n ) , • then draw µ from p ( µ | Σ , y 1 , · · · , y n ) with µ | Σ , y 1 , · · · , y n ∼ MVN( µ n , Σ κ n ) . 12 of 13

the multivariate normal model: Non-informative prior iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), both µ and Σ known, use Bayesian approach to estimate µ . ◦ a common non-informative prior is the Jeffreys prior density: p ( µ , Σ ) ∝ | Σ | − ( d +1) / 2 , which is the limit of the conjugate prior density as κ 0 → 0, ν 0 → − 1, | Λ 0 | → 0. ◦ the marginal and conditional densities can be written as Σ | y 1 , · · · , y n ∼ Inv-Wishart n − 1 ( S ) , y , Σ µ | Σ , y 1 , · · · , y n ∼ MVN(¯ n ) . ◦ marginal posterior of µ µ | y 1 , · · · , y n ∼ t n − d (¯ y , S / ( n ( n − d ))) . 13 of 13

Introduction to Bayesian Statistics Lecture 7: Multiparameter models - PowerPoint PPT Presentation

Introduction to Bayesian Statistics Lecture 7: Multiparameter models (III) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University April 15, 2015 Multiparameter model: the multinomial model y = ( y 1 , , y J )

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Statistics for Analytical Science at Warwick Simon Spencer Bayesian statistics in epidemiology

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Statistics for Applications Chapter 8: Bayesian Statistics 1/17 The Bayesian approach (1)

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Introduction to Bayesian Statistics Louis Raes Spring 2017 Table of contents Organisation,

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Workshop 7.2b: Introduction to Bayesian models Murray Logan February 7, 2017 Table of

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Speeding up Permutation Testing in Neuroimaging Chris Hinrichs , Vamsi Ithapu , Qinyuan

Delivery Open Integrated and Competitive European Market in Electricity Market Access

Random quantum states Ion Nechita CNRS, Laboratoire de Physique Th eorique, Toulouse ement

On Links Between the Random Matrix and Random Operator Theories L. Pastur Institute for Low

6. random variables T T T T H T H H Random VariablesIntro 2

AdaBoost MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON Elie Kawerk Data Scientist

CS 744: BiSMARCK Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Assignment 2 out! - Project

PAC Learning + Oracles, Sampling, Generative vs. Discriminative Matt Gormley Lecture 16 Oct.