Why Every Physicist Should Be a Bayesian (Towards a Complete - - PowerPoint PPT Presentation
Why Every Physicist Should Be a Bayesian (Towards a Complete - - PowerPoint PPT Presentation
Why Every Physicist Should Be a Bayesian (Towards a Complete Reconciliation between the Bayesian and the Frequentist Schools of Parametric Inference) Toma Podobnik Physics Department, University of Ljubljana Joef Stefan Institute,
16/01/2007 2
YETI’07 Recommended reading:
- R. D. Cousins, “Why Isn’t Every Physicist a Bayesian?”,
- Amer. J. Phys. 63 (1995) 398.
“Physicists embarking on seemingly routine error analyses are finding themselves grappling with major conceptual issues which have divided the statistical community for years. …The lurking controversy can come as a shock to a graduate student who encounters a statistical problem at some late stage in writing up the Ph.D. dissertation.”
16/01/2007 3
YETI’07 Basic Principles of scientific reasoning (Popper, 1959, pp. 91-92):
- 1. Principle of Consistency: Every theory must be
internally consistent: if a conclusion can be reasoned
- ut in more than one way, then every possible way must
lead to the same result. Also, identical states of knowledge in a problem must always lead to identical solutions of the problem.
- 2. Operational Principle: Every theory must specify
- perations that ensure falsifiability of its predictions.
16/01/2007 4
YETI’07 Direct probabilities (=long term relative frequencies):
(cdf) function
- n
distributi e) (cumulativ parameter
- ns
distributi sampling
- f
family (pdf); function density y probabilit n informatio given )
- bserving
(for
- bserving
for y probabilit : ' ) | ' ( ) , , ( : ) | ( ) | ( : ) | ( , ) , ( : ) | (
1 1 1 1
dx I x f I x F I I I dx I x f I x p I x f I dx x x x x x I x p
x xa
θ θ θ θ
∫
≡ = = = = + ∈ =
16/01/2007 5
( )
parameter n) (dispersio scale parameter location parameter scale parameter location ≡ ∞ ∈ ≡ ∞ −∞ ∈ ∞ −∞ ∈ = ≡ ∞ ∈ ∞ ∈ = ≡ ∞ −∞ ∈ ∞ −∞ ∈ − = ) , ( ) , ( ) , ( ; 1 ) | ( ) , ( ) , ( ; 1 ) | ( ) , ( ) , ( ; ) | ( σ µ φ σ µ σ φ σ µ µ φ µ x σ x-µ σ I x f x σ x σ I x f x x I x f
YETI’07 Location and scale parameters:
16/01/2007 6
( )
' ) | ' ( ) , , , ( 1 2 exp 2 1 ) | (
2 2
dx I x f I x F I σ x-µ σ x I x f
x
σ µ σ µ σ µ φ σ µ σ π σ µ
∫
∞ −
= ≡ ≡ ≡ = − − =
- n
distributi Gaussian parameter n) (dispersio scale parameter location
YETI’07 Examples:
) | ( I x f σ µ ) , , , ( I x F σ µ x x
1 = = σ µ 1 = = σ µ
16/01/2007 7
YETI’07 Axioms of conditional probability:
- every probability distribution is conditional upon the
available (relevant) information.
- ne
- to
- ne
) ( ; ) | ( ) ~ | ( . 4 1 ) | ( . 3 ) | ( ) | ( ) | ( ) | ( ) | ( . 2 ) | ( . 1
1
x y y x y I x f I y f dx I x f I y x f I y f I x y f I x f I y x f I x f
X
= ∂ ∂ = = = = ≥
−
∫
θ θ θ θ θ θ θ θ θ
16/01/2007 8
{ }
parameter location parameter scale = ⇒ − ≡ − = ∂ ∂ = ⇒ ≡ ≡ = ⇒ = − =
− − −
µ µ φ σ µ σ µ σ σ φ σ σ σ σ
µ µ
) ( ~ exp ) | ( ) ~ | ( ln ln ; 1 exp 1 ) | (
) ( ) ( 1
y e e x y I x f I y f x y x x I x f
y y
YETI’07 Example: Scale parameter reducible to location parameter!
) 1 | ( I x f = σ x
- n
distributi l exponentia ≡ I : ~ I →
- n
distributi
not
l exponentia
16/01/2007 9
YETI’07 Parametric inference:
) , ( : ) | ( ), , (
1 1 1 1 1 1
θ θ θ θ θ d I x dx x x x + ∈ + ∈ belief
- f
degree specify measured Given
Probabilistic approach (Bayesian school):
θ θ θ θ d I x f I x p I x ) | ( ) | ( ) | ( = →
- N. b.: f (q | x I0 ) distribution of our belief in different
values of q, not (!) distribution of q.
16/01/2007 10
YETI’07 Axioms of inverse probability:
) | ( ) | ( ) | ( ) | ( ) | ( 5.
- ne
- to
- ne
) ( ; ) | ( ) ~ | ( . 4 1 ) | ( . 3 ) | ( ) | ( ) | ( ) | ( ) | ( . 2 ) | ( . 1
1 2 1 2 1 2 1 1 2 1
I x x f I x x f I x x f I x f I x x f I x f I x f d I x f I x f I x f I x f I x f I x f I x f θ θ θ θ θ ν ν θ ν θ ν θ θ ν θ ν θ ν θ ν θ θ = = = ∂ ∂ = = = = ≥
− Θ
∫
16/01/2007 11
YETI’07 Pro’s for subjecting degrees of belief to the Axioms
- f probability:
- 1. “It is not excluded a priori that the same mathematical
theory may serve two purposes.” (Pólya, 1954, Chapter XV, p. 116)
- 2. Cox’s Theorem: Every theory of plausible inference is either
isomorfic to probability theory or inconsistent with very general qualitative requirements (e.g., (q œ(q1,q1+q)|x1 I0) →
(q –(q1,q1+q)|x1 I0) ). (Cox, 1946)
- 3. Dutch Book Theorem (de Finetti): A “Dutch Book” can be
- rganized against anyone whose betting coefficients violate
axioms of probability. (Howson and Urbach, 1991)
16/01/2007 12
⇒ = ⇒ = ) | ( ) | ( ) | ( ) | ( ) | ( ) | ( I x f I x f I x f I x f I x f I x f ν θ ν θν θ ν θ θν
YETI’07 Pro’s (cont’d):
- 4. Avoiding adhockeries. (O’Hagan, 2000, p. 20)
- 5. Powerful tools: marginalization and Bayes’ Theorem (Bayes, 1763)
∫ ∫
Θ Ν
= = ' ) | ' ( ) | ( ' ) | ' ( ) | ( θ ν θ ν ν θν θ d I x f I x f d I x f I x f ⇒ = = ) | ( ) | ( ); | ( ) | ( ) | ( ) | (
2 1 2 1 2 1 2 1 2 1
I x f I x x f I x x f I x x f I x x f I x f θ θ θ θ θ
; ) | ( ) | ( ) | ( ) | (
1 2 2 1 1 2
I x x f I x f I x f I x x f θ θ θ =
∫
Θ
= ' ) ' | ( ) | ' ( ) | (
2 1 1 2
θ θ θ d I x f I x f I x x f
16/01/2007 13
ion) (distribut prior e informativ
- non
: ) | ( ' ) ' | ( ) | ' ( ) | ( ) | ( ) | (
1 1 1
I f d I x f I f I x f I f I x f θ θ θ θ θ θ θ
∫
Θ
=
YETI’07 But…(con’s): a) how to assign f(q | x1I0 ) ??? b) what are verifiable predictions???
If making use of Bayes’ Theorem: “According to Bayesian philosophy it is also possible to make statements concerning the unknown q in the absence of data, and these statements can be summarized in a prior distribution.”
(Villegas, 1980)
16/01/2007 14
YETI’07 Example: The Principle of Insufficient Reason (Bayes, 1763;
Laplace, 1886, p. XVII)
a b
d I f C C I f
b a
θ θ θ θ θ
θ θ
− = = =
∫
−
' ) | ' ( ; ) | (
1
Twofold problem:
a) (qa,qb) infinite (e.g., qb= ∞) fl b) f(q |I0 ) not invariant under non-linear transformations
' ) | ' ( θ θ
θ θ
d I f
b a
∫
±
) ( const ← → ≠ ∝ ∂ ∂ = ⇒ =
−
| 1 ) | ( ) ~ | (
1 2
ν θ ν θ ν θ ν I f I f
16/01/2007 15
YETI’07
“A succession of authors have said that the prior probability is nonsense and that the principle of inverse probability, which cannot work without it, is nonsense too.” (Jeffreys, 1961, p. 120) “During the rapid development of practical statistics in the past few decades, the theoretical foundations of the subject have been involved in great obscurity. The obscurity is centred in the so-called ‘inverse’ methods. … The inverse probability is a mistake (perhaps the only mistake to which the mathematical world has so deeply commited itself).” (Fisher, 1922)
16/01/2007 16
YETI’07
“The essence of the present theory is that no probability, direct, prior, or posterior, is simply a frequency.” (Jeffreys,
1961, p. 401)
“Probability is a ratio of frequencies.” (Fisher, 1922, p.326)
Long-lasting and fierce controversy:
16/01/2007 17
YETI’07 Twofold aim of the lecture: 1. Overcome conceptual and practical problems concerning assignment of probability distributions to inferred parameters;
- 2. Reconcile the Bayesian and the frequentist schools
- f parametric inference.
16/01/2007 18
YETI’07 Consistency Theorem: How to assign f(q |x1I0 ) ?
) | ( ) | ( ) | ( ) | ( ; ) | ( ) | ( ) | ( ) | (
2 1 1 2 2 1 1 2 2 1 1 2
I x x f I x f I x f I x x f I x x f I x f I x f I x x f θ θ θ θ θ θ = =
Assumptions: a) x1and x2two independent measurements from f (x|qI0 ): f(x2 | x1 q I0 ) = f(x2 |qI0 ) and f(x1 | x2 q I0 ) = f(x1 |qI0 ) ; b) f(q |x1I0 ) and f(q |x2I0 ) can be assigned. Then (Bayes’ Theorem):
16/01/2007 19
Consistency:
f(q | x2 x1I0 ) = f(q | x1 x2I0 )
fl
p(q): consistency factor; not(!!) probability distribution (e.g., need not be normalizable); h(x): normalization factor
YETI’07
) ( ) | ( ) ( ) | ( x I x f I x f η θ θ π θ =
∫
Θ
≡ ' ) ' | ( ) ' ( ) ( θ θ θ π η d I x f x
Strikingly similar to Bayes’ Theorem, but…
16/01/2007 20
Properties of p(q):
1. Determined only up to a multiplication constant (say k);
- 2. Transformation p(q) Ø p(n) under q Ø n (one-to-one):
3. Depends on I0 (=the only available information before data (x1, x2,…) are collected).
YETI’07
∼
⇒ = ∂ ∂ =
−
) ( ~ ) ~ | ( ) ( ~ ) ~ | ( ) | ( ) ~ | (
1
x I x f I x f I x f I x f η ν ν π ν θ ν θ ν
1
) ( ) ( ~
−
∂ ∂ = θ ν θ π ν π k
16/01/2007 21
Consistency: YETI’07
) ( ) ( ~ ~ ν π ν π ∝ ⇒ = I I
(a.k.a.The Principle of Relative Invariance; Hartigan, 1964)
Example:
) | ( 1 exp 1 ) | ( ) ~ | ( : ; ) ( : : ; ) ( : 1 exp 1 ) | (
1
I y f y y I t f I y f a g g X X y t a t g t g t τ τ t τ I t f
a a a a
ν ν φ ν ν ν τ ν τ ν ν τ τ τ τ τ φ τ = = − = ∂ ∂ = ⇒ Θ → Θ ≡ = → ∈ → ≡ = → ∈ = = − =
−
- group
(induced) group ; parameter) scale (
fl f(t | tI0 ) invariant under fl
) ( ) ( 1 ) ( ) ( ) ( τ π τ π τ π a h a a k a ≡ =
16/01/2007 22
YETI’07 Example:
( )
constants : , ) ( ) ( ) ( ) ( 1 ) | ( ) ( ) ( ) ( ) ( ) | ( ) , ( ) , ( ) , ( ) , ( 1 ) | ( Solution equation Functional ation transform Inv.
- n
Distributi
) 1 (
q r a h a a x a x x I x f e b h b b b x x x I x f b a h a b a a b a b x a x x I x f
q S q L r LS + − − −
∝ = → → = ∝ = + + → + → − = ∝ = + → + → + → − = σ σ π σ π σ π σ σ σ φ σ σ µ µ π µ π µ π µ µ µ φ σ µ σ σ µ π σ µ π σ µ π σ σ µ µ σ µ φ σ σ µ
µ
Similarly:
16/01/2007 23
Product rule and Marginalization: YETI’07
) ( ) ( ) , ( σ π µ π σ µ π
S L LS
∝
1
) , ( ) ( 1 ) (
−
∝ ∝ ∝ σ σ µ π σ π µ π
LS S L
and
Consistency factors determined uniquely (up to an arbitrary multiplication constant) exclusively by observing the Axioms of Probability and the Principle of Consistency.
fl
16/01/2007 24
YETI’07 Examples: Inferring parameters of Gaussian distribution.
2 1 2 1 2 2 2 1
) ( 1 1 2 ) ( exp 2 1 ) | ( ) , , , (
n n i i n n i i n n
x x n s x n x x I x f x x x
∑ ∑
= =
− ≡ ≡ − − = ≡ and σ µ σ π σ µ K x
a) Both µ and σ unknown:
∏ ∏
= =
∝ = ∝
n i i n i i LS LS
I x f I x f I f I f
1 1
- 1
) | ( ) | ( ) , ( ) | ( ) , ( ) | ( σ µ σ σ µ σ µ π σ µ σ µ π σ µ x x
from sampled ts, measuremen t independen
16/01/2007 25
YETI’07
Marginalization:
( ) ( )
( )
( )
[ ]
( )
( )
− − Γ = = − + − Γ Γ = =
− − ∞ ∞ − ∞
∫ ∫
2 2 2 / ) 3 ( 2 / ) 1 ( 2 2 / 2 2 2 2 / 2
2 exp 1 2 / ) 1 ( 2 ' ) | ' ( ) | ( 1 2 / ) 1 ( 2 / ' ) | ' ( ) | ( σ σ µ σ µ σ µ π σ σ µ µ
n n n n n n n n n n n
s n n s n d I f I f x n s n s s n n n d I f I f x x x x
b) Only µ unknown: c) Only σ unknown:
− − ∝ = ∝
∏
= 2 2 1
2 ) ( exp 1 2 ) | ( ) ( ) | ( ) ( ) ~ | ( σ µ σ π σ µ µ π σ µ µ π σ µ
n n i i L L
x n n I x f I f I f x x
[ ]
( )
+ − − Γ + − ∝ ∝
+ − 2 2 2 1 1 2 / 2 / 2 2
2 ) ( exp 1 2 2 / ) ( ) | ( ) ( ) ~ ~ | ( σ µ σ µ σ µ σ π µ σ
n n n n n S
s n x n n s n x n I f I f x x
16/01/2007 26
YETI’07
For and :
2
2 =
= x n , 1
2 2 =
s
µ
σ
) ~ 1 | ( I f x = σ µ
) | ( I f x µ ) ~ ~ | ( I f x = µ σ ) | ( I f x σ
16/01/2007 27
YETI’07 Comments:
a) Consistency factors not normalizable, e.g., ,
fl p(q) not a probability distribution!!!
b) Consistency factors for the parameters of distributions that are invariant under Lie groups of transformations. Necessary condition: reducibility of q to location parameter (not a disaster; see below).
fl enough to determine p(m).
' ) ' ( µ µ π d
∫
∞ ∞ −
±
16/01/2007 28
YETI’07
“The most striking achievement of physical sciences is prediction.”
(Pólya, 1954, p. 64)
Calibration (coverage):
- f (q | x I0) calibrated if coverage of confidence intervals (q1, q2)
coincides with probability
( )
' ) | ' ( | ) , (
2 1
2 1
∫
= ∈
θ θ
θ θ θ θ θ d I x f I x P
- Fiducial theory:
(F(x,q,I0) monotone in q;
Fisher, 1956, p. 70)
) , , ( ) | ( I x F I x f θ θ θ ∂ ∂ =
16/01/2007 29
YETI’07 Important:
- 1. pL(m)=1 and p S(s)=pLS(m,s)=s-1 ensure calibrated inferences;
2. Exact calibration fl “Dutch Book” impossible; 3. Consistency theorem and Fiducial argument combined fl q necessarily reducible to a location parameter (Lindley, 1958).
16/01/2007 30
YETI’07 Therefore:
The Principle of Consistency and The Operational Principle are equivalent (identical consistency factors & applicable under identical circumstances). fl complete reconciliation between the Bayesian and the Frequentist schools of parametric inference!!!
16/01/2007 31
YETI’07
Probabilistic parametric inference not universal (e.g., pre-constrained parameters, counting experiments). Remedy (under fairly general conditions): “Repetitio est mater studiorum.” (Latin proverb) Example: inferring pre-constrained t of an exponential distribution.
ps 5
1 =
t ps 5 10 ) , , ( 1 = = = t n t t
n
K t
16/01/2007 32
YETI’07
Example: inferring parameter q of a binomial distribution
) 1 ( , ' 2 ) ' ( exp 2 1 ) | ( ) , , , ( : 1 ) 1 ( , ; ; ) 1 ( ) | (
2 2 5 .
θ θ σ θ µ σ µ σ π θ θ θ θ θ θ θ − = = − − = − ≤ ∈ ∈ − =
∫ ∑
+ ∞ − = −
n n dx x I n i p I n n F n n n n n n n n I n n p
n n i n n n
> q
n n
) , , , ( I n n F θ ) ~ , , , ( I n F σ µ ) , , , ( I n n F θ ) ~ , , , ( I n F σ µ
1 . 3 = = θ n 5 . 10 = = θ n
16/01/2007 33
YETI’07 Conclusions:
1. Consistency Theorem (instead of Bayes’ Theorem) for assigning f(q |x1I0 ); 2. Equivalence of the Consistency Principle and the Operational Principle for determination of p(q) ;
- 3. Equivalence of the Bayesian and the frequentist
schools of parametric inference.
16/01/2007 34
YETI’07 Applications:
1. Simple parametric inference; 2. Inference about the parameters of linear models (e.g., histogram fitting and partial wave analyses) (Stuart, Ord and
Arnold, 1999);
3. Inference about the parameters of dynamical models: q=q(t) (e.g., Kalman filter (Brown and Hwang, 1983)); 4. Predictive distributions (x=(x1,x2,…,xn) from f (x | q I0) → f (xn+1 | x I0)).
16/01/2007 35
YETI’07 Warning:
Several “Principles” for determination of f(q |I0 ): the Laplace Principle
- f Insufficient Reason (Bayes, 1763; Laplace, 1886, p. XVII), the Principle
- f Maximum Entropy (Jaynes, 2003, pp. 343-377), Reference Priors
(Bernardo, 1979), the Principle of Group (Form) Invariance (Harney, 2003),
the Principle of Reduction (Dawid, 1977): a) resulting f(q |I0 ) not unique; b) “Principles’’ inconsistent with Axioms of inverse probability; c) Non-calibrated inferences.
16/01/2007 36
YETI’07
Which kind of approach has been being advocated, frequentist or Bayesian? Depends….
16/01/2007 37
YETI’07
If:
1. Frequentist ª axioms of conditional probability only applicable to sampling distributions.
- 2. Bayesian ª (non-informative) prior probability
distributions indispensable in the process
- f inference.
…then none of the two.
16/01/2007 38
YETI’07
If:
1. Frequentist ª observing the Operational Principle.
- 2. (Objective) Bayesian ª observing the Principle of
Consistency.
…then both.
16/01/2007 39
YETI’07
T.P. and Živko, T. (2006). Towards Reconciliation between Bayesian and Frequentist Reasoning. In Lyons, L. and Ünel, M. K. (eds.). Statistical Problems in Particle Physics, Astrophysics and Cosmology (Proceedings of PHYSTAT05). London: Imperial College Press. Erratum: Inference about the parameters of Weibull distribution can be reduced to a location-scale problem. T.P. and Živko, T. On Probabilistic Inference about the Parameters
- f Sampling Distributions.
Bibliography:
16/01/2007 40
YETI’07
Bayes, Rev. T. (1763). An Essay towards solving a Problem in the Doctrine of Chances.
- Philos. Trans. R. Soc. Lond., 53: 370-418.
Bernardo, J. M. (1979). Reference Posterior Distributions for Bayesian Inference.
- J. R. Statist. Soc., B 41: 113-147.
Brown, R. G. and Hwang, P. Y. C. (1983). Introduction to Random Signals and Applied Kalman Filtering. John Wiley & Sons, Inc. Cox, R. T. (1946). Probability, Frequency and Reasonable Expectation. Amer. J. Phys., 14: 1-13. Dawid, A. P. (1977). Conformity of inference patterns. In Barra, J. R., van Cutsen, B., Brodeau, F. and Romier, G. (eds.). Developments in Statistics. Amsterdam: North-Holland. Fisher, R. A. (1922). On the Mathematical Foundations of Theoretical Statistics.
- Philos. Trans. R. Soc. Lond., A 222: 309-368.
References:
16/01/2007 41
YETI’07
Fisher, R. A. (1956). Satistical Methods and Scientific Inference. Edinbourgh: Oliver & Boyd. Harney, H. L. (2003). Bayesian Inference. Springer. Hartigan, J. A. (1964). Invariant Prior Distributions. Ann. Math. Statist., 35: 836-845. Howson, C. and Urbach, P. (1991). Bayesian Reasoning in Science. Nature, 350: 371-374 Jaynes, E. T. (2003). Probability Theory – The Logic of Science. Cambridge University Press. Jeffreys, H. (1961). Theory of Probability. Oxford: Clarendon Press. Laplace, P. S. (1886) Œvres Complètes – Tome Septième: Théorie Analitique des probabilités. Paris: Gauthier-Villars. Lindley, D. V. (1958). Fiducial Distribution and Bayes’ Theorem. J. R. Statist. Soc., B 20: 102-107.
References (cont’d):
16/01/2007 42
YETI’07
O’Hagan, A. (1994). Kendall’s Advanced Theory of Statistics, Vol. 2B – Bayesian
- Inference. London: Arnold.
Pólya, G. (1954). Mathematics and Plausible Reasoning, Vol.2 – Patterns of Plausible Inference. Princeton: Princeton University Press. Popper, K. R. (1959). The Logic of Scientific Discovery. London: Hutchinson & Co. Publishers. Stuart, A., Ord., K. and Arnold, S. (1999). Kendall’s Advanced Theory of Statistics,
- Vol. 2A – Classical Inference and the Linear Model. London: Arnold.
Villegas, C. (1980). Inner Statistical Inference II. Ann. Statist., 9: 768-776.