Bayesian model comparison with applications Johannes Bergstr om - PowerPoint PPT Presentation

Foundations Bayesian inference Examples and applications Bayesian model comparison with applications Johannes Bergstr¨ om Department of Theoretical Physics, KTH Royal Institute of Technology July 16, 2013 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Outline Foundations 1 Bayesian inference 2 Examples and applications 3 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Physics – how to do it? Experiment and observe – compare with predictions of models No perfect experiments – always noise/uncertainties, limited resources/sensitivity/range Logically deducing the true model doesn’t work All we can say is if a model is plausible description of data or not But how to determine this? Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Important information If you really don’t like statistics ..... you can stop listening now Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Principle of Bayesian inference Bayesian inference in a nutshell Assess hypotheses/models by calculating their plausibilities, conditioned on some known and/or presumed information. Cox’s Theorem (1946) The unique calculus of plausibility is probability theory (using some requirements incl. comparability, consistency) Unique extension of deductive logic incorporating uncertainty truth → 1, falsehood → 0 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Probability interpretations: what is distributed in Pr( X )? Bayesian probability Describes uncertainty Defined as plausibility Probability distributed over different propositions X X is not distributed nor random Frequentist probability Describes “randomness” Defined as long-run relative frequency of event X is distributed – a random variable Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Foundations 1 Bayesian inference 2 Examples and applications 3 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Bayesian inference – updating probabilities Updating probabilities Models H 1 . . . H r , data D . Bayes’ theorem: Pr( H i | D ) = Pr( D | H i ) Pr( H i ) Pr( D ) Pr( H i ) – prior probability Pr( H i | D ) – posterior probability Pr( D | H i ) = L ( H i ) – likelihood of H i Pr( H i | D ) L ( H i ) Pr( H i ) = Pr( H j | D ) L ( H j ) Pr( H j ) Posterior odds = Bayes factor · Prior odds Usually Prior odds = 1 Calculate either Bayes factor/posterior odds In addition assume precisely one of the H ′ i s correct ⇒ finite Pr( H i | D ) Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Model likelihood or evidence Models usually have free parameters Θ Likelihood for model – evidence – � � Pr( D | Θ , H ) Pr( Θ | H ) d N Θ = L ( Θ ) π ( Θ ) d N Θ L ( H ) = Pr( D | H ) = Model likelihood = Average likelihood of model parameters π ( Θ ) – Prior distribution – plausibility of parameters assuming model correct Evidence balances quality of fit vs. model complexity – can favour simpler model All probabilities conditioned on relevant background information (models, experimental setup, . . . ) Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Occam’s razor Evidence = probability with which model predicted observed data Occam’s razor – “simple” ≡ predictive Complex models compatible with large variety of data – predict less Simpler model More complex model Pr(D|H) Possible observations D Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Jeffreys scale Scale of interpretation easily calibrated: Jeffreys scale | log(odds) | odds Pr( H 1 | D ) Interpretation < 1 . 0 � 3 : 1 � 0 . 75 Inconclusive 1 . 0 ≃ 3 : 1 ≃ 0 . 75 Weak evidence 2 . 5 ≃ 12 : 1 ≃ 0 . 92 Moderate evidence 5 . 0 ≃ 150 : 1 ≃ 0 . 993 Strong evidence Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Priors Must specify priors on all model parameters – not invariant under general reparametrizations Important part of Bayesian analysis – consider carefully Uniform prior in the variable you happen to be writing your equations in (signal rate, x-section) often bad choice Improper prior always bad choice Evaluate sensitivity to prior choice Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Parameter inference Parameter inference – posterior distribution Assuming model H correct, infer its parameters Pr( Θ | D , H ) = Pr( D | Θ , H ) Pr( Θ | H ) = L ( Θ ) π ( Θ ) Pr( D | H ) L ( H ) Posterior of subsets of parameter by integrating over other parameters Posterior not enough to test/compare any model(s), claim discoveries – by definition Comparing models using posterior Compare nested model with η = η 0 using L ( η = η 0 ) L ( η � = η 0 ) = Pr( η 0 | D , H ) = Posterior at η 0 (Savage-Dickey density ratio) π ( η 0 | H ) Prior at η 0 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Frequentist model evaluation: P-values P-values P-value ≡ probability of obtaining equal or more extreme data than the observed assuming H 0 Extreme ≡ large value of test statistic ( χ 2 , profile likelihood, . . . ) Converted into “No. of σ ’s” using Gaussian CDF: S = φ − 1 (1 − p ) P-values are not See also D’Agostini, 1112.3620 Probability H 0 correct Probability data is “just a fluctuation” Probability of incorrectly rejecting H 0 Type-1 error rate α (0.05, 0.01...) Interpretation needs uniform scale – not really possible Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Model comparison in particle physics In particle physics Use to compare (“test”) different models Testing existence of “new physics” Discovery is primary – precise parameter values describing new physics often secondary Possible applications θ 13 = 0 vs. θ 13 > 0 CP-violation vs. CP-conservation Normal vs. inverted ordering Maximal vs. nonmaximal θ 23 Evidence of effects of neutrino mass: 0 νββ , β -decay, cosmology. Theoretical models of lepton mass, flavour, DM, . . . . . . Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Foundations 1 Bayesian inference 2 Examples and applications 3 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Leptonic mixing angle θ 13 – flashback to fall 2011 Question Is θ 13 = 0 or not? Profile likelihood ratio Schwetz, T´ ortola, Valle, 1108.1376 L ( θ max ) (∆ χ 2 ≃ 10) 13 p ≃ 1 . 5 · 10 − 3 L ( θ 13 = 0) ≃ 150 ⇒ Model comparison Bergstr¨ om, 1205.4404 Compare model θ 13 > 0 ( ∈ [0 , π/ 2]) with model θ 13 = 0 Compact parameter space ⇒ robust results Approx L ( θ 13 ) ∝ ∼ L profile ( θ 13 ) ⇒ L ( θ 13 > 0) L ( θ 13 = 0) ≃ 3 Barely weak preference for θ 13 > 0 Assign 0.5 prior ⇒ Pr( θ 13 = 0 | D ) ≃ 0 . 25 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Leptonic mixing angle θ 23 – today Question θ 23 is large, but is θ 23 maximal ( π/ 4) or not? Profile likelihood (for NO) ν fit v1.1: www.nu-fit.org, 1209.3023 (Gonzalez-Garcia, Maltoni, Salvado, Schwetz) L ( θ max ) (∆ χ 2 ≃ 1 . 8) 23 L ( θ 23 = π/ 4) ≃ 2 . 5 ⇒ p ≃ 0 . 18 Likelihood 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 sin( θ ) 2 Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Leptonic mixing angle θ 23 – today Model comparison Use L ( s 2 23 ) ∝ ∼ L profile ( s 2 23 ) and π ( s 2 23 ) = 1 Compare model likelihoods L ( θ 23 � = π/ 2) L ( θ 23 = π/ 4) ≃ 0 . 3 Maximal mixing preferred by data (weakly) Model with maximal θ 23 (slightly) better than non-maximal model Assign 0.5 prior ⇒ Pr( θ 23 = π/ 4 | D ) ≃ 0 . 75 Octant comparison L ( θ 23 < π/ 4) L ( θ 23 > π/ 4) ≃ 2 Future prospects Strong evidence for maximal mixing requires uncertainty on s 2 23 of roughly 0.002 (0 . 02 for moderate) Johannes Bergstr¨ om Bayesian model comparison with applications

Foundations Bayesian inference Examples and applications Neutrino parameters and cosmology Cosmological data sensitive to N eff Planck collaboration, 1303.5076 Planck + WP+ highL 1.0 + BAO + H 0 0.8 + BAO+ H 0 P/ P max 0.6 0.4 0.2 0.0 2.4 3.0 3.6 4.2 N ef How much evidence is there against N eff = 3 . 046? Johannes Bergstr¨ om Bayesian model comparison with applications

Bayesian model comparison with applications Johannes Bergstr om - PowerPoint PPT Presentation

Foundations Bayesian inference Examples and applications Bayesian model comparison with applications Johannes Bergstr om Department of Theoretical Physics, KTH Royal Institute of Technology July 16, 2013 Johannes Bergstr om Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian model averaging Dr. Jarad Niemi Iowa State University September 7, 2017 Jarad Niemi

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Bayesian Model Selection and Averaging Nonlinear Models Bayes factors Example Families FFX

Bayesian model averaging Dr. Jarad Niemi STAT 544 - Iowa State University March 9, 2017 Jarad

Overview Bayesian Model Selection Bayesian Learning of CPTs Dealing with Multiple Models Chris

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

TH perspective on CPV in (fermionic) Higgs couplings Adam Martin (amarti41@nd.edu) University of

outline of this tutorial motivations 1 ACISS09 tutorial on deep belief nets deep

s min : a global inclusive variable mass scale determination at LHC Pheno09, Madison

New particle mass spectrometry at the LHC : Resolving combinatoric endpoints Won-Sang Cho (IPMU)

Gamma-ray Signatures of Scalar Dark Matter Takashi Toma Durham University Institute for Particle

Vision Statements CMS611J/6.073 Fall 2014 MIT Game Lab CMS.611J/6.073 Fall 2014 1 Why?

Illinois P-20 Council Data, Assessment and Accountability Committee (DAA) 1 PREPARED FOR THE

07-10-2014 Detecting oestrus by monitoring sows visits to a boar T. Ostersen, C. Cornou, A.R.