Testing for Tensions Between Datasets
David Parkinson University of Queensland
In collaboration with Shahab Joudaki (Oxford)
Testing for Tensions Between Datasets David Parkinson University - - PowerPoint PPT Presentation
Testing for Tensions Between Datasets David Parkinson University of Queensland In collaboration with Shahab Joudaki (Oxford) Outline Introduction Statistical Inference Methods Linear models Example using WL and CMB data
David Parkinson University of Queensland
In collaboration with Shahab Joudaki (Oxford)
Theory of Probabilities
probability of causes and future events, derived from past events”
general laws of the universe, there is
ignorance, in part to our knowledge.”
applied to our level of knowledge
Pierre-Simon Laplace
(setting aside the Multiverse), we
make observations of un- repeatable ‘experiments’
inference
probe for biases by repeating the experiment - we cannot ‘restart the Universe’ (however much we may want to)
sets don’t agree), can’t take the data again. Need to instead make inferences with the data we have
0.16 0.24 0.32 0.40
Ωm
0.6 0.8 1.0 1.2
σ8
KiDS-450 CFHTLenS (MID J16) WMAP9+ACT+SPT Planck15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
z
0.2 0.3 0.4 0.5 0.6 0.7
f σ8(z)
assuming Planck ΛCDM cosmology
DR12 final consensus Planck ΛCDM 6dFGS SDSS MGS GAMA WiggleZ Vipers
Alam et al 2016
numerical value
logical absurdities, to be zero, P(∅)=0
probabilities over all options is unity, ∑P(Ai)≡1
A B
Sum Rule: P(A∪B)=P(A)+P(B)-P(A∩B) Product Rule: P(A∩B)=P(A)P(B|A)=P(B)P(A|B)
rule
parameters θ, and want to test it with some data D
parameters, as well as data P(A|B) = P(B|A)P(A) P(B) P(θ|D,M) = P(D|θ,M)P(θ|M) P(D|M) prior posterior likelihood evidence
we are left with the marginal likelihood, or evidence
we find the Bayes factor
choosing between different models
E=P(D|M)=⌠ ⌡P(D|θ,M)P(θ|M)dθ P(M1|D) P(M2|D)= P(D|M1)P(M1) P(D|M2)P(M2) Model prior likelihood evidence evidence prior Model posterior
model with the least amount of wasted parameter space (“most predictive”)
Best fit likelihood Occam factor
E = Z dθP(D|θ, M)P(θ|M) ≈ P(D|ˆ θ, M) × δθ ∆θ
Bayes factor are the same Difference Jeffrey (1961) Trotta (2006) Odds Δln(E)<1 No evidence No evidence 3:1 1<Δln(E)<2.5 substantial weak 12:1 2.5<Δln(E)<5 strong moderate 150:1 Δln(E)>5 decisive strong >150: 1
calculate accurately) we can approximate it using an Information Criteria statistic
predictivity
BIC = χ2(ˆ θ) + k ln N DIC = χ2(ˆ θ) + 2c
number of well-measured parameters
the information gain (KL divergence) between the prior and posterior, minus a point estimate
likelihood, this is given by
Cb = −2 ⇣ DKL [P(θ|D, M)P(θ|M)] − d DKL ⌘
Cb = χ2(θ) − χ2(¯ θ)
two datasets have different preferred values (posterior distributions) for some common parameters
0.16 0.24 0.32 0.40
Ωm
0.6 0.8 1.0 1.2
σ8
KiDS-450 CFHTLenS (MID J16) WMAP9+ACT+SPT Planck15
correct, but if the tension is significant
C(D1, D2, M) = P(D1 ∪ D2|M) P(D1|M)P(D2|M) ∆DIC = DIC(D1 ∪ D2) − DIC(D1) − DIC(D2)
1.likelihood normalisation 2.Occam factor (compression of prior into posterior) 3.Displacement between prior and posterior
likelihood (F=L+Π)
Occam factor larger
P(D|M) = L0 |F|−1/2 |Π|−1/2 exp −1 2(θT
LLθL + θT π Πθπ − ¯
θT F ¯ θ)
Image credit: Tamara Davis
entropy of two distributions
and updated dataset (D2)
expected
gain is considered in light on updating, or additional
S ⌘ DKL (P(θ|D2)||P(θ|D1)) hDi DKL (P(θ|D2)||P(θ|D1)) = Z P(θ|D2) log P(θ|D2) P(θ|D1)
relative entropy between two distributions
best fit likelihood and displacement
P(D1+2|M) P(D1|M)P(D2|M) = L1+2 L1
0L2
× |F1+2|−1/2 |F1|−1/2|F2|−1/2 × displacement terms
individual matrices, so complexity doesn’t change
difference in best likelihood
∆χ2 = χ2
1+2 − χ2 1 − χ2 2
∆Cb = Cb1+2 − Cb1 − Cb2
to D2) and expected information gain
possible outcomes for the combined data set
where the information gain is evaluated at the posterior maximum
similar, as the averaging process happens over the final posterior, not individual ones
DKL = −1 2 h χ2
1+2(θ) − χ2 1(θ)
i hDi = 1 2 ⇥ χ2
1+2(¯
θ) χ2
1(¯
θ) ⇤ S = DKL hDi = 1 2 h χ2
1+2(¯
θ) χ2
1(¯
θ) (χ2
1+2(θ) χ2 1(θ))
i
Approach Like ratio Evidence DIC Surprise Average over parameters
No Yes Yes Yes
From MCMC chain
Yes No Yes Yes
Probabalistic
Yes Yes Yes No
Symmetric
Yes Yes Yes No
model, with second data set
data, and also combined data, is the same
parameters well
worsening of 𝜓
2
(agreement) to positive (tension) as the offset increases
I(D1, D2) ≡ exp{−∆DIC(D1, D2)/2}
considered here are in light of a particular model
changed, the tension may be alleviated
as model selection
(2016) they compared the cosmological constraints from Planck CMB data with KiDS-450 weak lensing data
worsened tension, but allowing for dynamical dark energy improved agreement
Model T(S8) ΔDIC ΛCDM — fiducial systematics 2.1σ 1.26 Small tension — extended systematics 1.8σ 1.4 Small tension — large scales 1.9σ 1.24 Small tension Neutrino mass 2.4σ 0.022 Marginal case Curvature 3.5σ 3.4 Large tension Dark Energy (constant w) 0.89σ
Agreement Curvature + dark energy 2.1σ
Agreement
0.15 0.30 0.45 0.60
Ωm
0.75 1.00 1.25 1.50
σ8
KiDS-450 (ΛCDM+Ωk) Planck 2015 (ΛCDM+Ωk) KiDS (ΛCDM) Planck (ΛCDM) 0.60 0.75 0.90 1.05
σ8(Ωm/0.3)0.5
−0.15 −0.10 −0.05 0.00 0.05
ΩK
KiDS-450 Planck 2015
between data sets using ratios of model likelihood (evidence)
method, symmetric to evaluate tensions, being sensitive to likelihood ratio, but calibrated against parameter confidence regions
tomography, we find these data sets give better agreement when dynamical dark energy is included in the model