Testing for Tensions Between Datasets David Parkinson University - PowerPoint PPT Presentation

Testing for Tensions Between Datasets David Parkinson University of Queensland In collaboration with Shahab Joudaki (Oxford)

Outline • Introduction • Statistical Inference • Methods • Linear models • Example using WL and CMB data • Conclusions

What is Probability? • In 1812 Laplace published Analytic Theory of Probabilities • He suggested the computation of "the probability of causes and future events, derived from past events” • “Every event being determined by the general laws of the universe, there is only probability relative to us.” • “Probability is relative, in part to [our] ignorance, in part to our knowledge.” • So to Laplace, probability theory is Pierre-Simon Laplace applied to our level of knowledge

Comparing datasets • As there is only one Universe KiDS-450 1.2 CFHTLenS (MID J16) (setting aside the Multiverse) , we WMAP9+ACT+SPT make observations of un- Planck15 1.0 repeatable ‘experiments’ σ 8 • Therefore we have to proceed by 0.8 inference • Furthermore we cannot check or 0.6 probe for biases by repeating the 0.16 0.24 0.32 0.40 Ω m experiment - we cannot ‘restart the Universe’ (however much we may assuming Planck Λ CDM cosmology 0 . 7 DR12 final consensus want to) Planck Λ CDM 0 . 6 • If there is a tension (i.e. if two data f σ 8 ( z ) 0 . 5 sets don’t agree), can’t take the data again. Need to instead make 0 . 4 inferences with the data we have GAMA 0 . 3 6dFGS WiggleZ SDSS MGS Vipers 0 . 2 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 z Alam et al 2016

Rules of Probability • We define Probability to have numerical value • We define the lower bound, of logical absurdities, to be zero, P( ∅ )=0 B • We normalize it so the sum of the A probabilities over all options is unity, ∑ P(Ai) ≡ 1 P(A ∪ B)=P(A)+P(B)-P(A ∩ B) Sum Rule: P(A ∩ B)=P(A)P(B|A)=P(B)P(A|B) Product Rule:

Bayes Theorem • Bayes theorem is easily derived from the product rule P(B|A)P(A) P(A|B) = P(B) • We have some model M, with some unknown parameters θ , and want to test it with some data D posterior likelihood prior P(D| θ ,M)P( θ |M) P( θ |D,M) = P(D|M) evidence • Here we apply probability to models and parameters, as well as data

Model Selection • If we marginalize over the parameter uncertainties, we are left with the marginal likelihood, or evidence likelihood evidence prior E=P(D|M)= ⌠ ⌡ P(D| θ ,M)P( θ |M)d θ • If we compare the evidences of two different models, we find the Bayes factor evidence Model prior Model posterior P(M 1 |D) P(D|M 1 )P(M 1 ) P(M 2 |D) = P(D|M 2 )P(M 2 ) • Bayes theorem provides a consistent framework for choosing between different models

Occam’s Razor Z E = d θ P ( D | θ , M ) P ( θ |M ) θ , M ) × δθ ≈ P ( D | ˆ ∆ θ Best fit likelihood Occam factor • Occam factor rewards the model with the least amount of wasted parameter space (“most predictive”)

Bayesian Model Comparison • Jeffrey’s (1961) scale: Jeffrey Trotta Difference Odds (1961) (2006) No Δ ln(E)<1 No evidence 3:1 evidence 1< Δ ln(E)<2.5 substantial weak 12:1 2.5< Δ ln(E)<5 strong moderate 150:1 >150: Δ ln(E)>5 decisive strong 1 • If model priors are equal, evidence ratio and Bayes factor are the same

Information Criteria • Instead of using the Evidence (which is difficult to calculate accurately) we can approximate it using an Information Criteria statistic • Ability to fit the data (chi-squared) penalised by (lack of) predictivity • Smaller the value of the IC, the better the model • Bayesian Information Criterion BIC = χ 2 (ˆ θ ) + k ln N • k is the number of free parameters and N is the number of data points • Deviance Information Criterion (Spielgelhalter et al. 2002) DIC = χ 2 (ˆ θ ) + 2 c • Here c is the complexity, which is equal to number of well measured parameters

Complexity • The DIC penalises models based on the Bayesian complexity , the number of well-measured parameters • This can be computed through the information gain (KL divergence) between the prior and posterior, minus a point estimate ⇣ ⌘ D KL [ P ( θ | D, M ) P ( θ |M )] − d C b = − 2 D KL • For the simple gaussian likelihood, this is given by C b = χ 2 ( θ ) − χ 2 (¯ θ ) • Average is over posterior

Tensions • Tensions occur when KiDS-450 two datasets have 1.2 CFHTLenS (MID J16) different preferred WMAP9+ACT+SPT values (posterior Planck15 1.0 distributions) for some σ 8 common parameters 0.8 • This can arise due to • random chance 0.6 • systematic errors 0.16 0.24 0.32 0.40 Ω m • undiscovered physics

Diagnostic statistics • Need to diagnose not if the model is correct, but if the tension is significant • Simple test 𝜓 2 per degree of freedom • Equivalent to p-value test on data • Only a point estimate though • Raveri (2015): the evidence ratio P ( D 1 ∪ D 2 |M ) C ( D 1 , D 2 , M ) = P ( D 1 |M ) P ( D 2 |M ) • Joudaki et al (2016): change in DIC ∆ DIC = DIC( D 1 ∪ D 2 ) − DIC( D 1 ) − DIC( D 2 )

Linear evidence 2 3 | F | − 1 / 2  � − 1 θ T F ¯ π Π θ π − ¯ 2( θ T L L θ L + θ T P ( D |M ) = L 0 | Π | − 1 / 2 exp θ ) 1 • Evidence in linear case dependent on 1.likelihood normalisation 2.Occam factor (compression of prior into posterior) 3.Displacement between prior and posterior • In linear case, final Fisher information matrix is sum of prior and likelihood (F=L+ Π ) • If prior is wide, Π is small (so displacement minimised), but Occam factor larger

Simple linear model Image credit: Tamara Davis

Diagnostics II: The Surprise • Seehars et al (2016): the ‘Surprise’ statistic, based on cross entropy of two distributions • Cross entropy given by KL divergence between original ( D 1 ) and updated dataset ( D 2 )  P ( θ | D 2 ) � Z D KL ( P ( θ | D 2 ) || P ( θ | D 1 )) = P ( θ | D 2 ) log P ( θ | D 1 ) • Surprise is difference of observed KL divergence relative to expected • where expected assumes consistency S ⌘ D KL ( P ( θ | D 2 ) || P ( θ | D 1 )) � h D i • One data set is assumed to be ‘ground-truth’, and information gain is considered in light on updating, or additional

Linear tension P ( D 1 |M ) P ( D 2 |M ) = L 1+2 | F 1+2 | − 1 / 2 P ( D 1+2 |M ) 0 | F 1 | − 1 / 2 | F 2 | − 1 / 2 × displacement terms × L 1 0 L 2 0 • Displacement terms equivalent to `Surprise’ - relative entropy between two distributions • Occam factor independent of tensions • Tensions manifest in first and third terms - best fit likelihood and displacement

Linear DIC • Δ DIC statistic has two components • Difference in mean parameter (best fit) likelihood ∆ χ 2 = χ 2 1+2 − χ 2 1 − χ 2 2 • Difference in penalty term (complexity) ∆ C b = C b 1+2 − C b 1 − C b 2 • In linear case, final Fisher matrix is the sum of individual matrices, so complexity doesn’t change • Tension statistic (in linear case) driven entirely by difference in best likelihood

Linear Surprise • Surprise is difference between information gain (going from data set D 1 to D 2 ) and expected information gain • In the linear case, KL divergence can be D KL = − 1 h i χ 2 1+2 ( θ ) − χ 2 1 ( θ ) 2 • For the expectation of the information gain, need to average over possible outcomes for the combined data set • But in the linear case, this corresponds to the maximum likelihood, where the information gain is evaluated at the posterior maximum h D i = � 1 • 1+2 (¯ 1 (¯ χ 2 θ ) � χ 2 ⇥ ⇤ θ ) 2 • This is not the same as the complexity change, even though it looks similar, as the averaging process happens over the final posterior, not individual ones S = D KL � h D i = 1 h i 1+2 (¯ 1 (¯ χ 2 θ ) � χ 2 θ ) � ( χ 2 1+2 ( θ ) � χ 2 1 ( θ )) 2

Pros and Cons Approach Like ratio Evidence DIC Surprise Average over No Yes Yes Yes parameters From MCMC Yes No Yes Yes chain Probabalistic Yes Yes Yes No Symmetric Yes Yes Yes No

DIC • Simple 5th order polynomial model, with second data set offset from the first • Complexity of each individual data, and also combined data, is the same • Both measure the 5 free parameters well • DIC only changes due to 2 worsening of 𝜓 • The Δ DIC goes from negative (agreement) to positive (tension) as the offset increases • Odds ratio of agreement I ( D 1 , D 2 ) ≡ exp { − ∆ DIC( D 1 , D 2 ) / 2 }

KiDS vs Planck • All tensions considered here are in light of a particular model • If the model is changed, the tension may be alleviated • This is not the same as model selection

Application to lensing data • In Joudaki et al (2016) they T(S 8 ) Model Δ DIC compared the Λ CDM cosmological constraints from — fiducial systematics 2.1 σ 1.26 Small tension Planck CMB data — extended systematics 1.8 σ 1.4 Small tension with KiDS-450 weak lensing data — large scales 1.9 σ 1.24 Small tension • Including curvature Neutrino mass 2.4 σ 0.022 Marginal case worsened tension, Curvature 3.5 σ 3.4 Large tension but allowing for dynamical dark Dark Energy (constant w) 0.89 σ -1.98 Agreement energy improved Curvature + dark energy 2.1 σ -1.18 Agreement agreement

Testing for Tensions Between Datasets David Parkinson University - PowerPoint PPT Presentation

Testing for Tensions Between Datasets David Parkinson University of Queensland In collaboration with Shahab Joudaki (Oxford) Outline Introduction Statistical Inference Methods Linear models Example using WL and CMB data

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

1 Examples The ETH-80 Dataset (Bastian Leibe and Bernt Schiele) The Caltech 101 average image

ADDRESSING JURISDICTIONAL TENSIONS ON THE INTERNET ICANN Abu Dhabi, November 1, 2017

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Overview Objective Types of testing ECE 553: TESTING AND Verification testing

Object Oriented Testing Chapter 23 1 OO Testing Class Testing: Equivalent to unit testing

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Time-delay cosmography: Tensions between the Hubble constant inferred from the early and late

documentation Overview The datasets Common data manipulations Analysis using weights

Russian baseline datasets for climatological climatological Russian baseline datasets for

Learning with Large Datasets L eon Bottou NEC Laboratories America Why Large-scale Datasets?

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

The Security-Privacy tension: recent developments in the U.K. and elsewhere James Davenport

Forces MCV4U: Calculus & Vectors A force is a push or a pull on an object. A force has both a

Slide 4 / 41 3 A loaded truck collides with a car causing a large amount of damage to the car.

introduction hint and puzzle HFLAV16 introduction hint and puzzle HFLAV16 realistic

What if Meaning is Indeterminate? Ramsification and Semantic Indeterminacy Hannes Leitgeb LMU

Learning from Untrusted Data Moses Charikar, Jacob Steinhardt, Gregory Valiant Symposium on the

Chest Radiology Highlights: Tips, Tricks and Things You Should Never Miss! Case #1 1

Evaluation and Treatment of Evaluation and Treatment of Pulmonary Arterial Hypertension

Testing for Tensions Between Datasets David Parkinson University - PowerPoint PPT Presentation

Testing for Tensions Between Datasets David Parkinson University of Queensland In collaboration with Shahab Joudaki (Oxford) Outline Introduction Statistical Inference Methods Linear models Example using WL and CMB data

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

1 Examples The ETH-80 Dataset (Bastian Leibe and Bernt Schiele) The Caltech 101 average image

ADDRESSING JURISDICTIONAL TENSIONS ON THE INTERNET ICANN Abu Dhabi, November 1, 2017

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Overview Objective Types of testing ECE 553: TESTING AND Verification testing

Object Oriented Testing Chapter 23 1 OO Testing Class Testing: Equivalent to unit testing

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Time-delay cosmography: Tensions between the Hubble constant inferred from the early and late

documentation Overview The datasets Common data manipulations Analysis using weights

Russian baseline datasets for climatological climatological Russian baseline datasets for

Learning with Large Datasets L eon Bottou NEC Laboratories America Why Large-scale Datasets?

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

The Security-Privacy tension: recent developments in the U.K. and elsewhere James Davenport

Forces MCV4U: Calculus &amp; Vectors A force is a push or a pull on an object. A force has both a

Slide 4 / 41 3 A loaded truck collides with a car causing a large amount of damage to the car.

introduction hint and puzzle HFLAV16 introduction hint and puzzle HFLAV16 realistic

What if Meaning is Indeterminate? Ramsification and Semantic Indeterminacy Hannes Leitgeb LMU

Learning from Untrusted Data Moses Charikar, Jacob Steinhardt, Gregory Valiant Symposium on the

Chest Radiology Highlights: Tips, Tricks and Things You Should Never Miss! Case #1 1

Evaluation and Treatment of Evaluation and Treatment of Pulmonary Arterial Hypertension

Forces MCV4U: Calculus & Vectors A force is a push or a pull on an object. A force has both a