Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 1 / 32

Outline Normal model, unknown mean Jeffreys prior Natural conjugate prior Posterior Normal model, unknown variance Jeffreys prior Natural conjugate prior Posterior JAGS/Stan Binomial model, unknown probability Normal model, unknown mean Normal model, unknown variance Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 2 / 32

Normal model, unknown mean Jeffreys prior for µ Jeffreys prior for µ Theorem iid ∼ N ( µ, s 2 ) ( s 2 known), Jeffreys prior for µ is p ( µ ) ∝ 1 . If Y i Proof. Since the normal distribution with unknown mean is an exponential family, use Casella & Berger Lemma 7.3.11 � ∂ 2 � � ∂ 2 i =1 ( y i − µ ) 2 �� − log(2 πs 2 ) / 2 − 1 � n − E y ∂µ 2 log p ( y | µ ) = − E y ∂µ 2 2 s 2 � i − 2 µny + nµ 2 �� ∂ 2 � �� n − log(2 πs 2 ) / 2 − 1 i =1 y 2 = − E y ∂µ 2 2 s 2 � � �� ∂ 1 = − E y − 2 s 2 ( − 2 ny + 2 nµ ) ∂µ � � 1 = − E y − 2 s 2 (2 n ) = n/s 2 � � n/s 2 p ( µ ) ∝ |I ( µ ) | = ∝ 1 So Jeffreys prior for µ is p ( µ ) ∝ 1 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 3 / 32

Normal model, unknown mean Jeffreys prior for µ Posterior propriety � ∞ Since −∞ 1 dµ is not finite, we need to check posterior propriety. Theorem For n > 0 , the posterior for a normal mean (known variance) using Jeffreys prior is proper. Proof. The posterior is p ( µ | y ) ∝ p ( y | µ ) p ( µ ) � n � − 1 i =1 ( y i − µ ) 2 � ∝ exp × 1 2 s 2 � − 1 � − 2 µny + nµ 2 �� ∝ exp 2 s 2 � µ 2 − 2 µy �� 1 � = exp − . 2 s 2 /n This is the kernel of a normal distribution with mean y and variance s 2 /n which is proper if n > 0 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 4 / 32

Normal model, unknown mean Natural conjugate prior Natural conjugate prior iid ∼ N ( µ, s 2 ) with s 2 known. The likelihood is Let Y i � �� µ 2 − 2 µy 1 � L ( µ ) = exp − 2 s 2 /n � n s 2 µ 2 − 2 µ n � − 1 �� ∝ exp s 2 y 2 This is the kernel of a normal distribution, so the natural conjugate prior is µ ∼ N ( m, C ) . p ( µ | y ) ∝ p ( y | µ ) p ( µ ) = L ( µ ) p ( µ ) � n � 1 s 2 µ 2 − 2 µ n C µ 2 − 2 µ 1 − 1 − 1 � �� = exp s 2 y exp C m 2 �� 1 � 1 2 µ 2 − 2 µ � − 1 C + n � C m + n �� = exp s 2 y s 2 2 � � �� 1 µ 2 − 2 µ 1 1 C m + n = exp − s 2 y − 1 ( 1 C + n s 2 ) 2 ( 1 C + n s 2 ) This is the kernel of a N ( m ′ , C ′ ) where C ′ = C − 1 + n/s 2 � − 1 m ′ = C ′ � C − 1 m + n/s 2 y � � Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 5 / 32

Normal model, unknown mean Natural conjugate prior Normal mean posterior comments Let P = 1 /C , P ′ = 1 /C ′ , and Q = 1 /s 2 be the relevant precisions (inverse variances), then The posterior precision is the sum of the prior and observation precisions. n P ′ = P + � Q = P + nQ. i =1 The posterior mean is a precision weighted average of the prior and data. 1 m ′ = P ′ [ Pm + nQy ] = P P ′ m + n Q P ′ y P ′ m + � n Q = P P ′ y i i =1 Jeffreys prior/posterior are the limits of the conjugate prior/posterior as C → ∞ , i.e. C →∞ N ( m, C ) d C →∞ N ( m ′ , C ′ ) d → N ( y, s 2 /n ) lim → ∝ 1 lim Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 6 / 32

Normal model, unknown mean Natural conjugate prior Example ind Consider Y i ∼ N ( µ, 1) and µ ∼ N (0 , 1) . # Prior m = 0 C = 1; P = 1/C # Data mu = 1 s2 = 1; Q = 1/s2 n = 3 set.seed(6); (y = rnorm(n,mu,sqrt(1/Q))) [1] 1.2696060 0.3700146 1.8686598 # Posterior nQ = n*Q Pp = P+nQ mp = (P*m+nQ*mean(y))/Pp Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 7 / 32

Normal model, unknown mean Natural conjugate prior Normal model with unknown mean, normal prior 0.8 Prior Posterior Likelihood 0.6 Density 0.4 0.2 0.0 −2 −1 0 1 2 3 4 µ Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 8 / 32

Normal model, unknown variance Theorem iid ∼ N ( m, σ 2 ) ( m known), Jeffreys prior for σ 2 is p ( σ 2 ) ∝ 1 /σ 2 . If Y i Proof. Since the normal distribution with unknown variance is an exponential family, use Casella & Berger Lemma 7.3.11. � ∂ 2 � � ∂ 2 � ∂ ( σ 2)2 log p ( y | σ 2 ) ∂ ( σ 2)2 − n log(2 πσ 2 ) / 2 − 1 � n i =1 ( y i − m ) 2 − E y = − E y 2 σ 2 � � i =1 ( y i − m ) 2 ∂ n 1 � n = − E y ∂ ( σ 2) − 2 σ 2 + 2( σ 2)2 � � n 1 � n i =1 ( y i − m ) 2 = − E y 2( σ 2)2 − ( σ 2)3 ( σ 2)3 σ 2 n n = − 2( σ 2)2 + = n 2 ( σ 2 ) − 2 p ( σ 2 ) |I ( σ 2 ) | ∝ 1 /σ 2 � ∝ So Jeffreys prior is p ( σ 2 ) ∝ 1 /σ 2 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 9 / 32

Normal model, unknown variance Posterior propriety � ∞ 0 1 /σ 2 dσ 2 is not finite, we need to check posterior propriety. Since Theorem For n > 0 and at least one y i � = m , the posterior for a normal variance (known mean) using Jeffreys prior is proper. Proof. The posterior is p ( σ 2 | y ) ∝ p ( y | σ 2 ) p ( σ 2 ) = (2 πσ 2 ) − n/ 2 exp � n � − 1 i =1 [ y i − m ] 2 � ( σ 2 ) − 1 2 σ 2 ∝ ( σ 2 ) − n/ 2 − 1 exp � n � − 1 i =1 [ y i − m ] 2 � 2 σ 2 This is the kernel of an inverse gamma distribution with shape n/ 2 and scale � n i =1 [ y i − m ] 2 / 2 which will be proper so long as n > 0 and at least one y i � = m . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 10 / 32

Normal model, unknown variance Natural conjugate prior iid ∼ N ( m, σ 2 ) with m known. The likelihood is Let Y i ∝ ( σ 2 ) − n/ 2 exp � n L ( σ 2 ) � 1 i =1 [ y i − m ] 2 � − 2 σ 2 This is the kernel of an inverse gamma distribution, so the natural conjugate prior is IG ( a, b ) . p ( σ 2 | y ) ∝ p ( y | σ 2 ) p ( σ 2 ) = ( σ 2 ) − n/ 2 exp ( σ 2 ) − a − 1 exp( − b/σ 2 ) � n 1 i =1 [ y i − m ] 2 � � − 2 σ 2 = ( σ 2 ) − ( a + n/ 2) − 1 exp b + � n − 1 � � i =1 [ y i − m ] 2 / 2 �� σ 2 This is the kernel of an inverse gamma distribution with shape a + n/ 2 and scale b + � n i =1 [ y i − m ] 2 / 2 . Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 11 / 32

Normal model, unknown variance Example ind ∼ N (1 , σ 2 ) and σ 2 ∼ IG (1 , 1) . Suppose Y i # Prior a = b = 1 # Data m = 1 n = length(y) y [1] 1.2696060 0.3700146 1.8686598 # Posterior ap = a+n/2 (bp = b+sum((y-m)^2)/2) [1] 1.612069 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 12 / 32

Normal model, unknown variance Normal model with unknown variance, inverse gamma prior Prior 1.0 Posterior Likelihood 0.8 Density 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 σ 2 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 13 / 32

Normal model, unknown variance Summary Summary Suppose Y i ∼ N ( µ, σ 2 ) . µ unknown ( σ 2 known) Jeffreys prior: p ( µ ) ∝ 1 (think of this as N (0 , ∞ ) ) Natural conjugate prior: N ( m, C ) Posterior N ( m ′ , C ′ ) with C ′ = [1 /C + nσ − 2 ] − 1 m ′ = C ′ [ m/C + nσ − 2 y ] σ 2 unknown ( µ known) Jeffreys prior: p ( σ 2 ) ∝ 1 /σ 2 (think of this as IG (0 , 0) ) Natural conjugate prior IG ( a, b ) a + n/ 2 , b + � n � i =1 ( y i − µ ) 2 / 2 � Posterior IG Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 14 / 32

JAGS JAGS Just another Gibbs sampler (JAGS) “is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS.” We will use JAGS through its R interface rjags . The basic workflow when using rjags is 1. Define model and priors in a string 2. Assign data 3. Run JAGS, i.e. simulate from the posterior 4. Summarize as necessary, e.g. mean, median, credible intervals, etc Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 15 / 32

JAGS Binomial model Binomial model Let Y ∼ Bin ( n, θ ) and θ ∼ Be (1 , 1) and we observe y = 3 successes out of n = 10 attempts. model = " model { y ~ dbin(theta,n) # notice p then n theta ~ dbeta(a,b) } " dat = list(n=10, y=3, a=1, b=1) m = jags.model(textConnection(model), dat) Compiling model graph Resolving undeclared variables Allocating nodes Graph information: Observed stochastic nodes: 1 Unobserved stochastic nodes: 1 Total graph size: 5 Initializing model r = coda.samples(m, "theta", n.iter=1000) Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 16 / 32

JAGS Binomial model Binomial model summary(r) Iterations = 1001:2000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE 0.324050 0.122776 0.003883 0.004139 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% 0.1110 0.2337 0.3103 0.4091 0.5760 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 17 / 32

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 1 / 32 Outline Normal model, unknown mean Jeffreys prior Natural

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

Why LINEX Our Explanation (cont-d) Our Explanation (cont-d) (Linear Exponential) Our

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Minimal ConT EXt Distribution Mojca Miklavec, BachoT EX 2008 Specifics of ConT EXt

Migration to ConT EXt? First experience with ConT EXt typesetting Tom Hla KONVOJ

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate

Hands on Demos for Gaussian Process using R Software Tak (Hyungsuk) Tak & David Jones SAMSI

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

& Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019 Jarad Niemi (STAT544@ISU) Parameter estimation (cont.) January 24, 2019 1 / 32 Outline Normal model, unknown mean Jeffreys prior Natural

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Cleani ning C ng Cont ontract Cleani ning C ng Cont ontract Cleani ning C ng Cont

Why LINEX Our Explanation (cont-d) Our Explanation (cont-d) (Linear Exponential) Our

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Minimal ConT EXt Distribution Mojca Miklavec, BachoT EX 2008 Specifics of ConT EXt

Migration to ConT EXt? First experience with ConT EXt typesetting Tom Hla KONVOJ

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate

Hands on Demos for Gaussian Process using R Software Tak (Hyungsuk) Tak &amp; David Jones SAMSI

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

&amp; Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Machine Learning (CSE 446): Probabilistic Machine Learning MLE &amp; MAP Sham M Kakade 2018

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Hands on Demos for Gaussian Process using R Software Tak (Hyungsuk) Tak & David Jones SAMSI

& Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018