Why Every Physicist Should Be a Bayesian (Towards a Complete - - PowerPoint PPT Presentation

why every physicist should be a bayesian
SMART_READER_LITE
LIVE PREVIEW

Why Every Physicist Should Be a Bayesian (Towards a Complete - - PowerPoint PPT Presentation

Why Every Physicist Should Be a Bayesian (Towards a Complete Reconciliation between the Bayesian and the Frequentist Schools of Parametric Inference) Toma Podobnik Physics Department, University of Ljubljana Joef Stefan Institute,


slide-1
SLIDE 1

Why Every Physicist Should Be a Bayesian

(Towards a Complete Reconciliation between the Bayesian and the Frequentist Schools of Parametric Inference) Tomaž Podobnik Physics Department, University of Ljubljana Jožef Stefan Institute, Ljubljana, Slovenia

slide-2
SLIDE 2

16/01/2007 2

YETI’07 Recommended reading:

  • R. D. Cousins, “Why Isn’t Every Physicist a Bayesian?”,
  • Amer. J. Phys. 63 (1995) 398.

“Physicists embarking on seemingly routine error analyses are finding themselves grappling with major conceptual issues which have divided the statistical community for years. …The lurking controversy can come as a shock to a graduate student who encounters a statistical problem at some late stage in writing up the Ph.D. dissertation.”

slide-3
SLIDE 3

16/01/2007 3

YETI’07 Basic Principles of scientific reasoning (Popper, 1959, pp. 91-92):

  • 1. Principle of Consistency: Every theory must be

internally consistent: if a conclusion can be reasoned

  • ut in more than one way, then every possible way must

lead to the same result. Also, identical states of knowledge in a problem must always lead to identical solutions of the problem.

  • 2. Operational Principle: Every theory must specify
  • perations that ensure falsifiability of its predictions.
slide-4
SLIDE 4

16/01/2007 4

YETI’07 Direct probabilities (=long term relative frequencies):

(cdf) function

  • n

distributi e) (cumulativ parameter

  • ns

distributi sampling

  • f

family (pdf); function density y probabilit n informatio given )

  • bserving

(for

  • bserving

for y probabilit : ' ) | ' ( ) , , ( : ) | ( ) | ( : ) | ( , ) , ( : ) | (

1 1 1 1

dx I x f I x F I I I dx I x f I x p I x f I dx x x x x x I x p

x xa

θ θ θ θ

≡ = = = = + ∈ =

slide-5
SLIDE 5

16/01/2007 5

( )

parameter n) (dispersio scale parameter location parameter scale parameter location ≡ ∞ ∈ ≡ ∞ −∞ ∈ ∞ −∞ ∈       = ≡ ∞ ∈ ∞ ∈       = ≡ ∞ −∞ ∈ ∞ −∞ ∈ − = ) , ( ) , ( ) , ( ; 1 ) | ( ) , ( ) , ( ; 1 ) | ( ) , ( ) , ( ; ) | ( σ µ φ σ µ σ φ σ µ µ φ µ x σ x-µ σ I x f x σ x σ I x f x x I x f

YETI’07 Location and scale parameters:

slide-6
SLIDE 6

16/01/2007 6

( )

' ) | ' ( ) , , , ( 1 2 exp 2 1 ) | (

2 2

dx I x f I x F I σ x-µ σ x I x f

x

σ µ σ µ σ µ φ σ µ σ π σ µ

∞ −

= ≡ ≡ ≡       =       − − =

  • n

distributi Gaussian parameter n) (dispersio scale parameter location

YETI’07 Examples:

) | ( I x f σ µ ) , , , ( I x F σ µ x x

1 = = σ µ 1 = = σ µ

slide-7
SLIDE 7

16/01/2007 7

YETI’07 Axioms of conditional probability:

  • every probability distribution is conditional upon the

available (relevant) information.

  • ne
  • to
  • ne

) ( ; ) | ( ) ~ | ( . 4 1 ) | ( . 3 ) | ( ) | ( ) | ( ) | ( ) | ( . 2 ) | ( . 1

1

x y y x y I x f I y f dx I x f I y x f I y f I x y f I x f I y x f I x f

X

= ∂ ∂ = = = = ≥

θ θ θ θ θ θ θ θ θ

slide-8
SLIDE 8

16/01/2007 8

{ }

parameter location parameter scale = ⇒ − ≡ − = ∂ ∂ = ⇒    ≡ ≡ = ⇒       =      − =

− − −

µ µ φ σ µ σ µ σ σ φ σ σ σ σ

µ µ

) ( ~ exp ) | ( ) ~ | ( ln ln ; 1 exp 1 ) | (

) ( ) ( 1

y e e x y I x f I y f x y x x I x f

y y

YETI’07 Example: Scale parameter reducible to location parameter!

) 1 | ( I x f = σ x

  • n

distributi l exponentia ≡ I : ~ I →

  • n

distributi

not

l exponentia

slide-9
SLIDE 9

16/01/2007 9

YETI’07 Parametric inference:

) , ( : ) | ( ), , (

1 1 1 1 1 1

θ θ θ θ θ d I x dx x x x + ∈ + ∈ belief

  • f

degree specify measured Given

Probabilistic approach (Bayesian school):

θ θ θ θ d I x f I x p I x ) | ( ) | ( ) | ( = →

  • N. b.: f (q | x I0 ) distribution of our belief in different

values of q, not (!) distribution of q.

slide-10
SLIDE 10

16/01/2007 10

YETI’07 Axioms of inverse probability:

) | ( ) | ( ) | ( ) | ( ) | ( 5.

  • ne
  • to
  • ne

) ( ; ) | ( ) ~ | ( . 4 1 ) | ( . 3 ) | ( ) | ( ) | ( ) | ( ) | ( . 2 ) | ( . 1

1 2 1 2 1 2 1 1 2 1

I x x f I x x f I x x f I x f I x x f I x f I x f d I x f I x f I x f I x f I x f I x f I x f θ θ θ θ θ ν ν θ ν θ ν θ θ ν θ ν θ ν θ ν θ θ = = = ∂ ∂ = = = = ≥

− Θ

slide-11
SLIDE 11

16/01/2007 11

YETI’07 Pro’s for subjecting degrees of belief to the Axioms

  • f probability:
  • 1. “It is not excluded a priori that the same mathematical

theory may serve two purposes.” (Pólya, 1954, Chapter XV, p. 116)

  • 2. Cox’s Theorem: Every theory of plausible inference is either

isomorfic to probability theory or inconsistent with very general qualitative requirements (e.g., (q œ(q1,q1+q)|x1 I0) →

(q –(q1,q1+q)|x1 I0) ). (Cox, 1946)

  • 3. Dutch Book Theorem (de Finetti): A “Dutch Book” can be
  • rganized against anyone whose betting coefficients violate

axioms of probability. (Howson and Urbach, 1991)

slide-12
SLIDE 12

16/01/2007 12

⇒ = ⇒ = ) | ( ) | ( ) | ( ) | ( ) | ( ) | ( I x f I x f I x f I x f I x f I x f ν θ ν θν θ ν θ θν

YETI’07 Pro’s (cont’d):

  • 4. Avoiding adhockeries. (O’Hagan, 2000, p. 20)
  • 5. Powerful tools: marginalization and Bayes’ Theorem (Bayes, 1763)

∫ ∫

Θ Ν

= = ' ) | ' ( ) | ( ' ) | ' ( ) | ( θ ν θ ν ν θν θ d I x f I x f d I x f I x f ⇒ = = ) | ( ) | ( ); | ( ) | ( ) | ( ) | (

2 1 2 1 2 1 2 1 2 1

I x f I x x f I x x f I x x f I x x f I x f θ θ θ θ θ

; ) | ( ) | ( ) | ( ) | (

1 2 2 1 1 2

I x x f I x f I x f I x x f θ θ θ =

Θ

= ' ) ' | ( ) | ' ( ) | (

2 1 1 2

θ θ θ d I x f I x f I x x f

slide-13
SLIDE 13

16/01/2007 13

ion) (distribut prior e informativ

  • non

: ) | ( ' ) ' | ( ) | ' ( ) | ( ) | ( ) | (

1 1 1

I f d I x f I f I x f I f I x f θ θ θ θ θ θ θ

Θ

=

YETI’07 But…(con’s): a) how to assign f(q | x1I0 ) ??? b) what are verifiable predictions???

If making use of Bayes’ Theorem: “According to Bayesian philosophy it is also possible to make statements concerning the unknown q in the absence of data, and these statements can be summarized in a prior distribution.”

(Villegas, 1980)

slide-14
SLIDE 14

16/01/2007 14

YETI’07 Example: The Principle of Insufficient Reason (Bayes, 1763;

Laplace, 1886, p. XVII)

a b

d I f C C I f

b a

θ θ θ θ θ

θ θ

− = = =

' ) | ' ( ; ) | (

1

Twofold problem:

a) (qa,qb) infinite (e.g., qb= ∞) fl b) f(q |I0 ) not invariant under non-linear transformations

' ) | ' ( θ θ

θ θ

d I f

b a

±

) ( const ← → ≠ ∝ ∂ ∂ = ⇒ =

| 1 ) | ( ) ~ | (

1 2

ν θ ν θ ν θ ν I f I f

slide-15
SLIDE 15

16/01/2007 15

YETI’07

“A succession of authors have said that the prior probability is nonsense and that the principle of inverse probability, which cannot work without it, is nonsense too.” (Jeffreys, 1961, p. 120) “During the rapid development of practical statistics in the past few decades, the theoretical foundations of the subject have been involved in great obscurity. The obscurity is centred in the so-called ‘inverse’ methods. … The inverse probability is a mistake (perhaps the only mistake to which the mathematical world has so deeply commited itself).” (Fisher, 1922)

slide-16
SLIDE 16

16/01/2007 16

YETI’07

“The essence of the present theory is that no probability, direct, prior, or posterior, is simply a frequency.” (Jeffreys,

1961, p. 401)

“Probability is a ratio of frequencies.” (Fisher, 1922, p.326)

Long-lasting and fierce controversy:

slide-17
SLIDE 17

16/01/2007 17

YETI’07 Twofold aim of the lecture: 1. Overcome conceptual and practical problems concerning assignment of probability distributions to inferred parameters;

  • 2. Reconcile the Bayesian and the frequentist schools
  • f parametric inference.
slide-18
SLIDE 18

16/01/2007 18

YETI’07 Consistency Theorem: How to assign f(q |x1I0 ) ?

) | ( ) | ( ) | ( ) | ( ; ) | ( ) | ( ) | ( ) | (

2 1 1 2 2 1 1 2 2 1 1 2

I x x f I x f I x f I x x f I x x f I x f I x f I x x f θ θ θ θ θ θ = =

Assumptions: a) x1and x2two independent measurements from f (x|qI0 ): f(x2 | x1 q I0 ) = f(x2 |qI0 ) and f(x1 | x2 q I0 ) = f(x1 |qI0 ) ; b) f(q |x1I0 ) and f(q |x2I0 ) can be assigned. Then (Bayes’ Theorem):

slide-19
SLIDE 19

16/01/2007 19

Consistency:

f(q | x2 x1I0 ) = f(q | x1 x2I0 )

fl

p(q): consistency factor; not(!!) probability distribution (e.g., need not be normalizable); h(x): normalization factor

YETI’07

) ( ) | ( ) ( ) | ( x I x f I x f η θ θ π θ =

Θ

≡ ' ) ' | ( ) ' ( ) ( θ θ θ π η d I x f x

Strikingly similar to Bayes’ Theorem, but…

slide-20
SLIDE 20

16/01/2007 20

Properties of p(q):

1. Determined only up to a multiplication constant (say k);

  • 2. Transformation p(q) Ø p(n) under q Ø n (one-to-one):

3. Depends on I0 (=the only available information before data (x1, x2,…) are collected).

YETI’07

⇒        = ∂ ∂ =

) ( ~ ) ~ | ( ) ( ~ ) ~ | ( ) | ( ) ~ | (

1

x I x f I x f I x f I x f η ν ν π ν θ ν θ ν

1

) ( ) ( ~

∂ ∂ = θ ν θ π ν π k

slide-21
SLIDE 21

16/01/2007 21

Consistency: YETI’07

) ( ) ( ~ ~ ν π ν π ∝ ⇒ = I I

(a.k.a.The Principle of Relative Invariance; Hartigan, 1964)

Example:

) | ( 1 exp 1 ) | ( ) ~ | ( : ; ) ( : : ; ) ( : 1 exp 1 ) | (

1

I y f y y I t f I y f a g g X X y t a t g t g t τ τ t τ I t f

a a a a

ν ν φ ν ν ν τ ν τ ν ν τ τ τ τ τ φ τ =       =      − = ∂ ∂ = ⇒ Θ → Θ ≡ = → ∈ → ≡ = → ∈ =       =      − =

  • group

(induced) group ; parameter) scale (

fl f(t | tI0 ) invariant under fl

) ( ) ( 1 ) ( ) ( ) ( τ π τ π τ π a h a a k a ≡ =

slide-22
SLIDE 22

16/01/2007 22

YETI’07 Example:

( )

constants : , ) ( ) ( ) ( ) ( 1 ) | ( ) ( ) ( ) ( ) ( ) | ( ) , ( ) , ( ) , ( ) , ( 1 ) | ( Solution equation Functional ation transform Inv.

  • n

Distributi

) 1 (

q r a h a a x a x x I x f e b h b b b x x x I x f b a h a b a a b a b x a x x I x f

q S q L r LS + − − −

∝ = → →       = ∝ = + + → + → − = ∝ = + → + → + →       − = σ σ π σ π σ π σ σ σ φ σ σ µ µ π µ π µ π µ µ µ φ σ µ σ σ µ π σ µ π σ µ π σ σ µ µ σ µ φ σ σ µ

µ

Similarly:

slide-23
SLIDE 23

16/01/2007 23

Product rule and Marginalization: YETI’07

) ( ) ( ) , ( σ π µ π σ µ π

S L LS

1

) , ( ) ( 1 ) (

∝ ∝ ∝ σ σ µ π σ π µ π

LS S L

and

Consistency factors determined uniquely (up to an arbitrary multiplication constant) exclusively by observing the Axioms of Probability and the Principle of Consistency.

fl

slide-24
SLIDE 24

16/01/2007 24

YETI’07 Examples: Inferring parameters of Gaussian distribution.

2 1 2 1 2 2 2 1

) ( 1 1 2 ) ( exp 2 1 ) | ( ) , , , (

n n i i n n i i n n

x x n s x n x x I x f x x x

∑ ∑

= =

− ≡ ≡       − − = ≡ and σ µ σ π σ µ K x

a) Both µ and σ unknown:

∏ ∏

= =

∝ = ∝

n i i n i i LS LS

I x f I x f I f I f

1 1

  • 1

) | ( ) | ( ) , ( ) | ( ) , ( ) | ( σ µ σ σ µ σ µ π σ µ σ µ π σ µ x x

from sampled ts, measuremen t independen

slide-25
SLIDE 25

16/01/2007 25

YETI’07

Marginalization:

( ) ( )

( )

( )

[ ]

( )

( )

     − − Γ = = − + − Γ Γ = =

− − ∞ ∞ − ∞

∫ ∫

2 2 2 / ) 3 ( 2 / ) 1 ( 2 2 / 2 2 2 2 / 2

2 exp 1 2 / ) 1 ( 2 ' ) | ' ( ) | ( 1 2 / ) 1 ( 2 / ' ) | ' ( ) | ( σ σ µ σ µ σ µ π σ σ µ µ

n n n n n n n n n n n

s n n s n d I f I f x n s n s s n n n d I f I f x x x x

b) Only µ unknown: c) Only σ unknown:

      − − ∝ = ∝

= 2 2 1

2 ) ( exp 1 2 ) | ( ) ( ) | ( ) ( ) ~ | ( σ µ σ π σ µ µ π σ µ µ π σ µ

n n i i L L

x n n I x f I f I f x x

[ ]

( )

      + − − Γ + − ∝ ∝

+ − 2 2 2 1 1 2 / 2 / 2 2

2 ) ( exp 1 2 2 / ) ( ) | ( ) ( ) ~ ~ | ( σ µ σ µ σ µ σ π µ σ

n n n n n S

s n x n n s n x n I f I f x x

slide-26
SLIDE 26

16/01/2007 26

YETI’07

For and :

2

2 =

= x n , 1

2 2 =

s

µ

σ

) ~ 1 | ( I f x = σ µ

) | ( I f x µ ) ~ ~ | ( I f x = µ σ ) | ( I f x σ

slide-27
SLIDE 27

16/01/2007 27

YETI’07 Comments:

a) Consistency factors not normalizable, e.g., ,

fl p(q) not a probability distribution!!!

b) Consistency factors for the parameters of distributions that are invariant under Lie groups of transformations. Necessary condition: reducibility of q to location parameter (not a disaster; see below).

fl enough to determine p(m).

' ) ' ( µ µ π d

∞ ∞ −

±

slide-28
SLIDE 28

16/01/2007 28

YETI’07

“The most striking achievement of physical sciences is prediction.”

(Pólya, 1954, p. 64)

Calibration (coverage):

  • f (q | x I0) calibrated if coverage of confidence intervals (q1, q2)

coincides with probability

( )

' ) | ' ( | ) , (

2 1

2 1

= ∈

θ θ

θ θ θ θ θ d I x f I x P

  • Fiducial theory:

(F(x,q,I0) monotone in q;

Fisher, 1956, p. 70)

) , , ( ) | ( I x F I x f θ θ θ ∂ ∂ =

slide-29
SLIDE 29

16/01/2007 29

YETI’07 Important:

  • 1. pL(m)=1 and p S(s)=pLS(m,s)=s-1 ensure calibrated inferences;

2. Exact calibration fl “Dutch Book” impossible; 3. Consistency theorem and Fiducial argument combined fl q necessarily reducible to a location parameter (Lindley, 1958).

slide-30
SLIDE 30

16/01/2007 30

YETI’07 Therefore:

The Principle of Consistency and The Operational Principle are equivalent (identical consistency factors & applicable under identical circumstances). fl complete reconciliation between the Bayesian and the Frequentist schools of parametric inference!!!

slide-31
SLIDE 31

16/01/2007 31

YETI’07

Probabilistic parametric inference not universal (e.g., pre-constrained parameters, counting experiments). Remedy (under fairly general conditions): “Repetitio est mater studiorum.” (Latin proverb) Example: inferring pre-constrained t of an exponential distribution.

ps 5

1 =

t ps 5 10 ) , , ( 1 = = = t n t t

n

K t

slide-32
SLIDE 32

16/01/2007 32

YETI’07

Example: inferring parameter q of a binomial distribution

) 1 ( , ' 2 ) ' ( exp 2 1 ) | ( ) , , , ( : 1 ) 1 ( , ; ; ) 1 ( ) | (

2 2 5 .

θ θ σ θ µ σ µ σ π θ θ θ θ θ θ θ − = =       − − = − ≤ ∈ ∈ −         =

∫ ∑

+ ∞ − = −

n n dx x I n i p I n n F n n n n n n n n I n n p

n n i n n n

> q

 

n n

) , , , ( I n n F θ ) ~ , , , ( I n F σ µ ) , , , ( I n n F θ ) ~ , , , ( I n F σ µ

1 . 3 = = θ n 5 . 10 = = θ n

slide-33
SLIDE 33

16/01/2007 33

YETI’07 Conclusions:

1. Consistency Theorem (instead of Bayes’ Theorem) for assigning f(q |x1I0 ); 2. Equivalence of the Consistency Principle and the Operational Principle for determination of p(q) ;

  • 3. Equivalence of the Bayesian and the frequentist

schools of parametric inference.

slide-34
SLIDE 34

16/01/2007 34

YETI’07 Applications:

1. Simple parametric inference; 2. Inference about the parameters of linear models (e.g., histogram fitting and partial wave analyses) (Stuart, Ord and

Arnold, 1999);

3. Inference about the parameters of dynamical models: q=q(t) (e.g., Kalman filter (Brown and Hwang, 1983)); 4. Predictive distributions (x=(x1,x2,…,xn) from f (x | q I0) → f (xn+1 | x I0)).

slide-35
SLIDE 35

16/01/2007 35

YETI’07 Warning:

Several “Principles” for determination of f(q |I0 ): the Laplace Principle

  • f Insufficient Reason (Bayes, 1763; Laplace, 1886, p. XVII), the Principle
  • f Maximum Entropy (Jaynes, 2003, pp. 343-377), Reference Priors

(Bernardo, 1979), the Principle of Group (Form) Invariance (Harney, 2003),

the Principle of Reduction (Dawid, 1977): a) resulting f(q |I0 ) not unique; b) “Principles’’ inconsistent with Axioms of inverse probability; c) Non-calibrated inferences.

slide-36
SLIDE 36

16/01/2007 36

YETI’07

Which kind of approach has been being advocated, frequentist or Bayesian? Depends….

slide-37
SLIDE 37

16/01/2007 37

YETI’07

If:

1. Frequentist ª axioms of conditional probability only applicable to sampling distributions.

  • 2. Bayesian ª (non-informative) prior probability

distributions indispensable in the process

  • f inference.

…then none of the two.

slide-38
SLIDE 38

16/01/2007 38

YETI’07

If:

1. Frequentist ª observing the Operational Principle.

  • 2. (Objective) Bayesian ª observing the Principle of

Consistency.

…then both.

slide-39
SLIDE 39

16/01/2007 39

YETI’07

T.P. and Živko, T. (2006). Towards Reconciliation between Bayesian and Frequentist Reasoning. In Lyons, L. and Ünel, M. K. (eds.). Statistical Problems in Particle Physics, Astrophysics and Cosmology (Proceedings of PHYSTAT05). London: Imperial College Press. Erratum: Inference about the parameters of Weibull distribution can be reduced to a location-scale problem. T.P. and Živko, T. On Probabilistic Inference about the Parameters

  • f Sampling Distributions.

Bibliography:

slide-40
SLIDE 40

16/01/2007 40

YETI’07

Bayes, Rev. T. (1763). An Essay towards solving a Problem in the Doctrine of Chances.

  • Philos. Trans. R. Soc. Lond., 53: 370-418.

Bernardo, J. M. (1979). Reference Posterior Distributions for Bayesian Inference.

  • J. R. Statist. Soc., B 41: 113-147.

Brown, R. G. and Hwang, P. Y. C. (1983). Introduction to Random Signals and Applied Kalman Filtering. John Wiley & Sons, Inc. Cox, R. T. (1946). Probability, Frequency and Reasonable Expectation. Amer. J. Phys., 14: 1-13. Dawid, A. P. (1977). Conformity of inference patterns. In Barra, J. R., van Cutsen, B., Brodeau, F. and Romier, G. (eds.). Developments in Statistics. Amsterdam: North-Holland. Fisher, R. A. (1922). On the Mathematical Foundations of Theoretical Statistics.

  • Philos. Trans. R. Soc. Lond., A 222: 309-368.

References:

slide-41
SLIDE 41

16/01/2007 41

YETI’07

Fisher, R. A. (1956). Satistical Methods and Scientific Inference. Edinbourgh: Oliver & Boyd. Harney, H. L. (2003). Bayesian Inference. Springer. Hartigan, J. A. (1964). Invariant Prior Distributions. Ann. Math. Statist., 35: 836-845. Howson, C. and Urbach, P. (1991). Bayesian Reasoning in Science. Nature, 350: 371-374 Jaynes, E. T. (2003). Probability Theory – The Logic of Science. Cambridge University Press. Jeffreys, H. (1961). Theory of Probability. Oxford: Clarendon Press. Laplace, P. S. (1886) Œvres Complètes – Tome Septième: Théorie Analitique des probabilités. Paris: Gauthier-Villars. Lindley, D. V. (1958). Fiducial Distribution and Bayes’ Theorem. J. R. Statist. Soc., B 20: 102-107.

References (cont’d):

slide-42
SLIDE 42

16/01/2007 42

YETI’07

O’Hagan, A. (1994). Kendall’s Advanced Theory of Statistics, Vol. 2B – Bayesian

  • Inference. London: Arnold.

Pólya, G. (1954). Mathematics and Plausible Reasoning, Vol.2 – Patterns of Plausible Inference. Princeton: Princeton University Press. Popper, K. R. (1959). The Logic of Scientific Discovery. London: Hutchinson & Co. Publishers. Stuart, A., Ord., K. and Arnold, S. (1999). Kendall’s Advanced Theory of Statistics,

  • Vol. 2A – Classical Inference and the Linear Model. London: Arnold.

Villegas, C. (1980). Inner Statistical Inference II. Ann. Statist., 9: 768-776.

References (cont’d):