 
              Why Every Physicist Should Be a Bayesian (Towards a Complete Reconciliation between the Bayesian and the Frequentist Schools of Parametric Inference) Tomaž Podobnik Physics Department, University of Ljubljana Jožef Stefan Institute, Ljubljana, Slovenia
YETI’07 Recommended reading: R. D. Cousins, “Why Isn’t Every Physicist a Bayesian?”, Amer. J. Phys. 63 (1995) 398. “Physicists embarking on seemingly routine error analyses are finding themselves grappling with major conceptual issues which have divided the statistical community for years. …The lurking controversy can come as a shock to a graduate student who encounters a statistical problem at some late stage in writing up the Ph.D. dissertation.” 16/01/2007 2
YETI’07 Basic Principles of scientific reasoning (Popper, 1959, pp. 91-92): 1. Principle of Consistency: Every theory must be internally consistent: if a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result. Also, identical states of knowledge in a problem must always lead to identical solutions of the problem. 2. Operational Principle: Every theory must specify operations that ensure falsifiability of its predictions. 16/01/2007 3
YETI’07 Direct probabilities (=long term relative frequencies): probabilit y for observing (for observing = p ( x | I ) : x x 1 1 ) given informatio n ∈ + x ( x , x dx ) , I 1 1 probabilit y density function (pdf); f ( x | I ) : = p ( x | I ) f ( x | I ) dx family of sampling distributi ons = θ = I I : I 0 0 parameter θ = (cumulativ e) distributi on x ∫ θ ≡ θ F ( x , , I ) f ( x ' | I ) dx ' : function (cdf) 0 0 x a 16/01/2007 4
YETI’07 Location and scale parameters: ( ) µ = φ − µ ∈ −∞ ∞ f ( x | I ) x ; x ( , ) 0 location parameter µ ∈ −∞ ∞ ≡ ( , )   1 x σ = φ ∈ ∞   f ( x | I ) ; x ( 0 , ) 0 σ σ   scale parameter σ ∈ ∞ ≡ ( 0 , )   1 x-µ µ σ = φ ∈ −∞ ∞   f ( x | I ) ; x ( , ) 0 σ  σ  location parameter µ ∈ −∞ ∞ ≡ ( , ) scale (dispersio n) parameter σ ∈ ∞ ≡ ( 0 , ) 16/01/2007 5
YETI’07 Examples: µ σ f ( x | I ) µ = 0 0 σ = 1 ( )   − µ 2 1 x 1  x-µ  µ σ = − = φ f ( x | I ) exp     0 σ π σ 2 2 σ  σ  2   location parameter µ ≡ scale (dispersio n) parameter σ ≡ Gaussian distributi on ≡ I x 0 µ σ x F ( x , , , I ) ∫ µ = µ σ = µ σ 0 0 F ( x , , , I ) f ( x ' | I ) dx ' 0 0 σ = 1 − ∞ x 16/01/2007 6
YETI’07 Axioms of conditional probability: •every probability distribution is conditional upon the available (relevant) information. θ ≥ 1 . f ( x | I ) 0 0 θ = θ θ 2 . f ( x y | I ) f ( x | I ) f ( y | x I ) 0 0 0 = θ θ f ( y | I ) f ( x | y I ) 0 0 ∫ θ = 3 . f ( x | I ) dx 1 0 X − 1 ∂ ~ y θ = θ = 4 . f ( y | I ) f ( x | I ) ; y y ( x ) one - to - one 0 0 ∂ x 16/01/2007 7
YETI’07 Example: σ = f ( x | 1 I ) −    0 1 x 1 x σ = = φ   f ( x | I ) exp ;   0 σ σ σ σ     scale parameter ⇒ σ = − 1 ∂ y ≡ ~ x  y ln x ⇒ µ = σ f ( y | I ) f ( x | I )  µ ≡ σ ln 0 0  ∂ x { } − µ − µ = ( y ) − ( y ) e exp e ~ location parameter ≡ φ − µ ⇒ µ = ( y ) ~ not → exponentia l distributi on distributi on exponentia l ≡ I : I 0 0 Scale parameter reducible to location parameter! 16/01/2007 8
YETI’07 Parametric inference: Given measured specify degree of belief ∈ + x ( x , x dx ), 1 1 θ θ ∈ θ θ + θ ( | x I ) : ( , d ) 1 1 0 1 1 Probabilistic approach (Bayesian school): θ → θ = θ θ ( | x I ) p ( | x I ) f ( | x I ) d 0 0 0 N. b.: f (q | x I 0 ) distribution of our belief in different values of q , not (!) distribution of q. 16/01/2007 9
YETI’07 Axioms of inverse probability: θ ≥ 1 . f ( | x I ) 0 0 θ ν = θ ν θ 2 . f ( | x I ) f ( | x I ) f ( | x I ) 0 0 0 = ν θ ν f ( | x I ) f ( | x I ) 0 0 ∫ θ θ = 3 . f ( | x I ) d 1 0 Θ − 1 ∂ ν ~ ν = θ ν = ν θ 4 . f ( | x I ) f ( | x I ) ; ( ) one - to - one 0 0 ∂ θ θ = θ θ 5. f ( x | x I ) f ( | x I ) f ( x | x I ) 2 1 0 1 0 2 1 0 = θ f ( x | x I ) f ( | x x I ) 2 1 0 2 1 0 16/01/2007 10
YETI’07 Pro’s for subjecting degrees of belief to the Axioms of probability: 1. “It is not excluded a priori that the same mathematical theory may serve two purposes.” (Pólya, 1954, Chapter XV, p. 116) 2. Cox’s Theorem: Every theory of plausible inference is either isomorfic to probability theory or inconsistent with very general qualitative requirements (e.g., ( q œ(q 1 ,q 1 +q )|x 1 I 0 ) → ( q –(q 1 ,q 1 +q )|x 1 I 0 ) ). (Cox, 1946) 3. Dutch Book Theorem (de Finetti): A “Dutch Book” can be organized against anyone whose betting coefficients violate axioms of probability. (Howson and Urbach, 1991) 16/01/2007 11
YETI’07 Pro’s (cont’d): 4. Avoiding adhockeries. (O’Hagan, 2000, p. 20) 5. Powerful tools: marginalization and Bayes’ Theorem (Bayes, 1763) ∫ θ = θν ν θν = θ ν θ ⇒ f ( | x I ) f ( ' | x I ) d ' f ( | x I ) f ( | x I ) f ( | x I ) 0 0 0 0 0 Ν ∫ ν = θ ν θ θν = ν θ ν ⇒ f ( | x I ) f ( ' | x I ) d ' f ( | x I ) f ( | x I ) f ( | x I ) 0 0 0 0 0 Θ θ θ = θ θ = θ f ( | x I ) f ( x | x I ) f ( x | x I ) f ( | x x I ); f ( x | x I ) f ( x | I ) 1 0 2 1 0 2 1 0 2 1 0 2 1 0 2 0 θ θ f ( | x I ) f ( x | I ) ∫ = θ θ θ θ = f ( x | x I ) f ( ' | x I ) f ( x | ' I ) d ' f ( | x x I ) 1 0 2 0 ; ⇒ 2 1 0 1 0 2 0 2 1 0 f ( x | x I ) Θ 2 1 0 16/01/2007 12
YETI’07 But…(con’s): a) how to assign f (q | x 1 I 0 ) ??? b) what are verifiable predictions??? If making use of Bayes’ Theorem: θ θ f ( | I ) f ( x | I ) θ = f ( | x I ) 0 1 0 1 0 ∫ θ θ θ f ( ' | I ) f ( x | ' I ) d ' 0 1 0 Θ non - informativ e prior (distribut ion) θ f ( | I ) : 0 “According to Bayesian philosophy it is also possible to make statements concerning the unknown q in the absence of data, and these statements can be summarized in a prior distribution.” (Villegas, 1980) 16/01/2007 13
YETI’07 Example: The Principle of Insufficient Reason (Bayes, 1763; Laplace, 1886, p. XVII) θ b ∫ θ = − = θ θ = θ − θ 1 f ( | I ) C ; C f ( ' | I ) d ' 0 0 b a θ a Twofold problem: θ b ∫ a) (q a , q b ) infinite (e.g., q b = ∞ ) fl θ θ ± f ( ' | I ) d ' 0 θ a b) f ( q | I 0 ) not invariant under non-linear transformations − 1 ∂ ν ~ 1 const ( ) ν = θ ⇒ ν = θ ∝ ≠ → ← 2 f ( | I ) f ( | I ) | 0 0 ∂ θ ν 16/01/2007 14
YETI’07 “A succession of authors have said that the prior probability is nonsense and that the principle of inverse probability, which cannot work without it, is nonsense too.” (Jeffreys, 1961, p. 120) “During the rapid development of practical statistics in the past few decades, the theoretical foundations of the subject have been involved in great obscurity. The obscurity is centred in the so-called ‘inverse’ methods. … The inverse probability is a mistake (perhaps the only mistake to which the mathematical world has so deeply commited itself).” (Fisher, 1922) 16/01/2007 15
YETI’07 Long-lasting and fierce controversy: “The essence of the present theory is that no probability, direct, prior, or posterior, is simply a frequency.” (Jeffreys, 1961, p. 401) “Probability is a ratio of frequencies.” (Fisher, 1922, p.326) 16/01/2007 16
YETI’07 Twofold aim of the lecture: 1. Overcome conceptual and practical problems concerning assignment of probability distributions to inferred parameters; 2. Reconcile the Bayesian and the frequentist schools of parametric inference. 16/01/2007 17
YETI’07 Consistency Theorem: How to assign f ( q | x 1 I 0 ) ? Assumptions: a) x 1 and x 2 two independent measurements from f ( x |q I 0 ) : f ( x 2 | x 1 q I 0 ) = f ( x 2 |q I 0 ) and f ( x 1 | x 2 q I 0 ) = f ( x 1 |q I 0 ) ; b) f ( q | x 1 I 0 ) and f ( q | x 2 I 0 ) can be assigned. Then (Bayes’ Theorem): θ θ θ θ f ( | x I ) f ( x | I ) f ( | x I ) f ( x | I ) θ = θ = f ( | x x I ) 1 0 2 0 ; f ( | x x I ) 2 0 1 0 2 1 0 1 2 0 f ( x | x I ) f ( x | x I ) 2 1 0 1 2 0 16/01/2007 18
Recommend
More recommend