Selecting priors
Applied Bayesian Statistics
- Dr. Earvin Balderama
Department of Mathematics & Statistics Loyola University Chicago
September 19, 2017
1
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Selecting priors Applied Bayesian Statistics Dr. Earvin Balderama - - PowerPoint PPT Presentation
Selecting priors Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago September 19, 2017 Selecting priors 1 Last edited September 8, 2017 by Earvin Balderama
Applied Bayesian Statistics
Department of Mathematics & Statistics Loyola University Chicago
September 19, 2017
1
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Selecting the prior is one of the most important steps in a Bayesian analysis, but there are many schools of thought on this. The choices often depend on the objective of the study and the nature of the data:
1
Conjugate vs. Non-conjugate
2
Informative vs. Uninformative
3
Subjective vs. Objective
4
Proper vs. Improper
2
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
A prior is conjugate if the posterior is a member of the same parametric family. The advantage is that the posterior is available in closed form (very convenient for MCMC) There is a long list of conjugate priors: https://en.wikipedia.org/wiki/Conjugate_prior
3
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
For many likelihoods/parameters, there is no known conjugate prior. A silly example Say Y ∼ Poisson(λ) and λ ∼ Beta(a, b). Then the posterior is f(λ |Y) ∝ e−λλYλa−1(1 − λ)b−1 This is not a Beta PDF, therefore the prior is not conjugate. (In fact, this doesn’t look like a member of any known family of distributions).
4
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Conjugacy is only about the family of distributions that the prior belongs to. However, we can utilize outside knowledge to make a prior (conjugate or non-conjugate) more informative. Potential sources include literature reviews, pilot studies, and expert
Prior elicitation is the process of converting expert information into a prior distribution.
For example, the expert may not know what an inverse gamma pdf is, but you can choose a and b so that the distribution reflects the beliefs of the expert.
Any time informative priors are used you should conduct a sensitivity analysis, that is, evaluate how the posterior differs for several priors.
5
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
FDA recommendations http://www.fda.gov/RegulatoryInformation/Guidances/ ucm071072.htm “Discussions with FDA regarding study design will include an evaluation
analysis.” “We recommend you identify as many sources of good prior information as possible. The evaluation of ‘goodness’ of the prior information is
approval of a medical device, you should present and discuss your choice
statistical) before your study begins.”
6
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
“Prior distributions based directly on data from other studies are the easiest to evaluate. While we recognize that two studies are never exactly alike, we nonetheless recommend the studies used to construct the prior be similar to the current study...” “Prior distributions based on expert opinion rather than data can be
advisory panel members or other clinical evaluators do not agree with the
7
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Another word for a very informative prior is a strong prior. Very strong priors are typically used only for nuisance or other special parameters. Strong priors for the main parameters of interest can be tough to defend (even with expert opinion involved). Most of the time, uninformative priors are used. Other terms used to describe uninformative priors: vague, weak, flat, diffuse, etc. Question: How vague or weak can we make a prior? Can we always select a prior that is completely uninformative?
8
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Another word for a very informative prior is a strong prior. Very strong priors are typically used only for nuisance or other special parameters. Strong priors for the main parameters of interest can be tough to defend (even with expert opinion involved). Most of the time, uninformative priors are used. Other terms used to describe uninformative priors: vague, weak, flat, diffuse, etc. Question: How vague or weak can we make a prior? Can we always select a prior that is completely uninformative? The idea is to let the likelihood overwhelm the prior. We can select priors with large variance, e.g., µ ∼ Normal(0, 106). (Try to plot this and it will look flat).
8
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Subjective decisions include picking the likelihood, treatment of outliers, transformations, ... and prior specification. For example, should we select σ ∼ Uniform(0, 10) or σ2 ∼ Uniform(0, 100)? Choosing a prior subjectively, before collecting data, may be rejected by a reader who does not share the same view, so uninformative priors and a sensitivity analysis are common.
9
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
An objective analysis is one that requires no subjective decisions by the analyst. A completely objective analysis may be feasible in tightly controlled experiments, but is impossible in many analyses. An objective Bayesian attempts to replace the subjective choice of prior with an algorithm that determines the prior. Any objective prior must give the same results, e.g„ the same posterior mean for any prior parameterization. For example, σ ∼ Uniform(0, 10) or σ2 ∼ Uniform(0, 100) are quite different: σ ∼ Uniform − → f(σ2) ∝ 1/σ.
10
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
The Jeffreys prior is invariant to transformations. For a parameter θ, the Jeffreys prior is f(θ) =
where I(θ) = −EY|θ
dθ2 log f(Y |θ)
Note: Once you specify the likelihood, the Jeffreys prior is determined with no additional input. Therefore you do not have to make a subjective decision about the prior (other than the subjective decision to use a Jeffreys prior).
11
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
Examples Likelihood Jeffreys prior Y ∼ Binomial(n, θ) θ ∼ Beta 1
2, 1 2
f(µ) ∝ 1 Y ∼ Normal(µ, σ2) f(µ) ∝ 1/σ Many of these priors are improper priors, meaning they do not integrate to one. In this case, you have to check that the posterior is proper and integrates to one.
12
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>
In Empirical Bayes, priors are chosen based on the data. For example, setting the prior mean for σ2 to be s2,
In general, one can use marginal maximum likelihood to fix nuisance parameters. Note: Empirical Bayes is usually criticized for “using the data twice”
13
Selecting priors Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>