Why Every Physicist Should Be a Bayesian (Towards a Complete - PowerPoint PPT Presentation

Why Every Physicist Should Be a Bayesian (Towards a Complete Reconciliation between the Bayesian and the Frequentist Schools of Parametric Inference) Tomaž Podobnik Physics Department, University of Ljubljana Jožef Stefan Institute, Ljubljana, Slovenia

YETI’07 Recommended reading: R. D. Cousins, “Why Isn’t Every Physicist a Bayesian?”, Amer. J. Phys. 63 (1995) 398. “Physicists embarking on seemingly routine error analyses are finding themselves grappling with major conceptual issues which have divided the statistical community for years. …The lurking controversy can come as a shock to a graduate student who encounters a statistical problem at some late stage in writing up the Ph.D. dissertation.” 16/01/2007 2

YETI’07 Basic Principles of scientific reasoning (Popper, 1959, pp. 91-92): 1. Principle of Consistency: Every theory must be internally consistent: if a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result. Also, identical states of knowledge in a problem must always lead to identical solutions of the problem. 2. Operational Principle: Every theory must specify operations that ensure falsifiability of its predictions. 16/01/2007 3

YETI’07 Direct probabilities (=long term relative frequencies): probabilit y for observing (for observing = p ( x | I ) : x x 1 1 ) given informatio n ∈ + x ( x , x dx ) , I 1 1 probabilit y density function (pdf); f ( x | I ) : = p ( x | I ) f ( x | I ) dx family of sampling distributi ons = θ = I I : I 0 0 parameter θ = (cumulativ e) distributi on x ∫ θ ≡ θ F ( x , , I ) f ( x ' | I ) dx ' : function (cdf) 0 0 x a 16/01/2007 4

YETI’07 Location and scale parameters: ( ) µ = φ − µ ∈ −∞ ∞ f ( x | I ) x ; x ( , ) 0 location parameter µ ∈ −∞ ∞ ≡ ( , )   1 x σ = φ ∈ ∞   f ( x | I ) ; x ( 0 , ) 0 σ σ   scale parameter σ ∈ ∞ ≡ ( 0 , )   1 x-µ µ σ = φ ∈ −∞ ∞   f ( x | I ) ; x ( , ) 0 σ  σ  location parameter µ ∈ −∞ ∞ ≡ ( , ) scale (dispersio n) parameter σ ∈ ∞ ≡ ( 0 , ) 16/01/2007 5

YETI’07 Examples: µ σ f ( x | I ) µ = 0 0 σ = 1 ( )   − µ 2 1 x 1  x-µ  µ σ = − = φ f ( x | I ) exp     0 σ π σ 2 2 σ  σ  2   location parameter µ ≡ scale (dispersio n) parameter σ ≡ Gaussian distributi on ≡ I x 0 µ σ x F ( x , , , I ) ∫ µ = µ σ = µ σ 0 0 F ( x , , , I ) f ( x ' | I ) dx ' 0 0 σ = 1 − ∞ x 16/01/2007 6

YETI’07 Example: σ = f ( x | 1 I ) −    0 1 x 1 x σ = = φ   f ( x | I ) exp ;   0 σ σ σ σ     scale parameter ⇒ σ = − 1 ∂ y ≡ ~ x  y ln x ⇒ µ = σ f ( y | I ) f ( x | I )  µ ≡ σ ln 0 0  ∂ x { } − µ − µ = ( y ) − ( y ) e exp e ~ location parameter ≡ φ − µ ⇒ µ = ( y ) ~ not → exponentia l distributi on distributi on exponentia l ≡ I : I 0 0 Scale parameter reducible to location parameter! 16/01/2007 8

YETI’07 Parametric inference: Given measured specify degree of belief ∈ + x ( x , x dx ), 1 1 θ θ ∈ θ θ + θ ( | x I ) : ( , d ) 1 1 0 1 1 Probabilistic approach (Bayesian school): θ → θ = θ θ ( | x I ) p ( | x I ) f ( | x I ) d 0 0 0 N. b.: f (q | x I 0 ) distribution of our belief in different values of q , not (!) distribution of q. 16/01/2007 9

YETI’07 Axioms of inverse probability: θ ≥ 1 . f ( | x I ) 0 0 θ ν = θ ν θ 2 . f ( | x I ) f ( | x I ) f ( | x I ) 0 0 0 = ν θ ν f ( | x I ) f ( | x I ) 0 0 ∫ θ θ = 3 . f ( | x I ) d 1 0 Θ − 1 ∂ ν ~ ν = θ ν = ν θ 4 . f ( | x I ) f ( | x I ) ; ( ) one - to - one 0 0 ∂ θ θ = θ θ 5. f ( x | x I ) f ( | x I ) f ( x | x I ) 2 1 0 1 0 2 1 0 = θ f ( x | x I ) f ( | x x I ) 2 1 0 2 1 0 16/01/2007 10

YETI’07 Pro’s for subjecting degrees of belief to the Axioms of probability: 1. “It is not excluded a priori that the same mathematical theory may serve two purposes.” (Pólya, 1954, Chapter XV, p. 116) 2. Cox’s Theorem: Every theory of plausible inference is either isomorfic to probability theory or inconsistent with very general qualitative requirements (e.g., ( q œ(q 1 ,q 1 +q )|x 1 I 0 ) → ( q –(q 1 ,q 1 +q )|x 1 I 0 ) ). (Cox, 1946) 3. Dutch Book Theorem (de Finetti): A “Dutch Book” can be organized against anyone whose betting coefficients violate axioms of probability. (Howson and Urbach, 1991) 16/01/2007 11

YETI’07 Pro’s (cont’d): 4. Avoiding adhockeries. (O’Hagan, 2000, p. 20) 5. Powerful tools: marginalization and Bayes’ Theorem (Bayes, 1763) ∫ θ = θν ν θν = θ ν θ ⇒ f ( | x I ) f ( ' | x I ) d ' f ( | x I ) f ( | x I ) f ( | x I ) 0 0 0 0 0 Ν ∫ ν = θ ν θ θν = ν θ ν ⇒ f ( | x I ) f ( ' | x I ) d ' f ( | x I ) f ( | x I ) f ( | x I ) 0 0 0 0 0 Θ θ θ = θ θ = θ f ( | x I ) f ( x | x I ) f ( x | x I ) f ( | x x I ); f ( x | x I ) f ( x | I ) 1 0 2 1 0 2 1 0 2 1 0 2 1 0 2 0 θ θ f ( | x I ) f ( x | I ) ∫ = θ θ θ θ = f ( x | x I ) f ( ' | x I ) f ( x | ' I ) d ' f ( | x x I ) 1 0 2 0 ; ⇒ 2 1 0 1 0 2 0 2 1 0 f ( x | x I ) Θ 2 1 0 16/01/2007 12

YETI’07 But…(con’s): a) how to assign f (q | x 1 I 0 ) ??? b) what are verifiable predictions??? If making use of Bayes’ Theorem: θ θ f ( | I ) f ( x | I ) θ = f ( | x I ) 0 1 0 1 0 ∫ θ θ θ f ( ' | I ) f ( x | ' I ) d ' 0 1 0 Θ non - informativ e prior (distribut ion) θ f ( | I ) : 0 “According to Bayesian philosophy it is also possible to make statements concerning the unknown q in the absence of data, and these statements can be summarized in a prior distribution.” (Villegas, 1980) 16/01/2007 13

YETI’07 Example: The Principle of Insufficient Reason (Bayes, 1763; Laplace, 1886, p. XVII) θ b ∫ θ = − = θ θ = θ − θ 1 f ( | I ) C ; C f ( ' | I ) d ' 0 0 b a θ a Twofold problem: θ b ∫ a) (q a , q b ) infinite (e.g., q b = ∞ ) fl θ θ ± f ( ' | I ) d ' 0 θ a b) f ( q | I 0 ) not invariant under non-linear transformations − 1 ∂ ν ~ 1 const ( ) ν = θ ⇒ ν = θ ∝ ≠ → ← 2 f ( | I ) f ( | I ) | 0 0 ∂ θ ν 16/01/2007 14

YETI’07 “A succession of authors have said that the prior probability is nonsense and that the principle of inverse probability, which cannot work without it, is nonsense too.” (Jeffreys, 1961, p. 120) “During the rapid development of practical statistics in the past few decades, the theoretical foundations of the subject have been involved in great obscurity. The obscurity is centred in the so-called ‘inverse’ methods. … The inverse probability is a mistake (perhaps the only mistake to which the mathematical world has so deeply commited itself).” (Fisher, 1922) 16/01/2007 15

YETI’07 Long-lasting and fierce controversy: “The essence of the present theory is that no probability, direct, prior, or posterior, is simply a frequency.” (Jeffreys, 1961, p. 401) “Probability is a ratio of frequencies.” (Fisher, 1922, p.326) 16/01/2007 16

YETI’07 Twofold aim of the lecture: 1. Overcome conceptual and practical problems concerning assignment of probability distributions to inferred parameters; 2. Reconcile the Bayesian and the frequentist schools of parametric inference. 16/01/2007 17

Why Every Physicist Should Be a Bayesian (Towards a Complete - PowerPoint PPT Presentation

Why Every Physicist Should Be a Bayesian (Towards a Complete Reconciliation between the Bayesian and the Frequentist Schools of Parametric Inference) Toma Podobnik Physics Department, University of Ljubljana Joef Stefan Institute,

www.Every-Mind.org www.Every-Mind.org www.Every-Mind.org www.Every-Mind.org

Jupyter in HPC 1 Matthias Bussonnier A Physicist/Bio-Physicist About Me Core developer

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Solid State Physics (Major, 8 ECTS) 1 physicist over 3 in US declares to be a condensed matter

The Artificial Physicist or fully automatic modeling with statistical learning Guillaume

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Soft Linear Logic, Lambda-Calculus and Intersection Types Simona Ronchi Della Rocca, Marco

Goals Explore connections between Logic and Combinatorics. Logic of provability : mainly

Classification Basic concepts Decision tree Nave Bayesian classifier Model

CS 418: Interactive Computer Graphics Color Eric Shaffer Rainbow versus Black and White Color

CISC 323 CISC 323 Intro to Software Engineering Intro to Software Engineering

About Chemical Equilibrium and Free Energy UNIT 6 DAY 2 What are we going to learn today?

Particle in a 2-D Potential well = ( , ) ( , ) H x y E x y V=0 n L y

New tools for chemical bonding analysis Eduard Matito Donostia International Physics Center