X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - - PowerPoint PPT Presentation

x ray spectral workshop 2019 poisson statistics with
SMART_READER_LITE
LIVE PREVIEW

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - - PowerPoint PPT Presentation

J. MICHAEL BURGESS - MPE X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image via Vianello (2018) POISSON STATISTICS WITH BACKGROUNDS POISSON OBSERVATION + POISSON BACKGROUND Background measurement


slide-1
SLIDE 1

X-RAY SPECTRAL WORKSHOP 2019

  • J. MICHAEL BURGESS - MPE
slide-2
SLIDE 2

POISSON STATISTICS WITH BACKGROUNDS

Background measurement

Image via Vianello (2018)
slide-3
SLIDE 3

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

Background measurement

Image via Vianello (2018)
slide-4
SLIDE 4

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

Background measurement Observation

Image via Vianello (2018)
slide-5
SLIDE 5

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

Background measurement Observation

π(Si, Bi|mi, bi ; ts, tb) =

N

i=1

(ts(mi + bi))Sie−ts(mi+bi) Si! × (tbbi)Bie−tbbi Bi!

slide-6
SLIDE 6

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

Background measurement Observation

π(Si, Bi|mi, bi ; ts, tb) =

N

i=1

(ts(mi + bi))Sie−ts(mi+bi) Si! × (tbbi)Bie−tbbi Bi!

slide-7
SLIDE 7

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

Background measurement Observation

π(Si, Bi|mi, bi ; ts, tb) =

N

i=1

(ts(mi + bi))Sie−ts(mi+bi) Si! × (tbbi)Bie−tbbi Bi!

total rate model total background counts total counts source rate model background rate model

slide-8
SLIDE 8

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

∂ ∂bi π(Si, Bi|mi, bi ; ts, tb) = 0

If we do not have a model, we can still proceed! Assume each bin contains a piecewise background model with a parameter, bi=fi, and maximize the likelihood for these

  • parameters. We then solve this equation for the fi in

terms of the other parameters.

logℒ = W =

N

i=1

tsmi + (ts + tb)fi − Si log(tsmi + tsfi) − Bi log(tbfi) − Si(1 − log Si) − Bi(1 − logBi)

This is a profile-likelihood where we have pre-maximized

  • r “profiled out” the dependence on the unknown

background model.

slide-9
SLIDE 9

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

∂ ∂bi π(Si, Bi|mi, bi ; ts, tb) = 0

If we do not have a model, we can still proceed! Assume each bin contains a piecewise background model with a parameter, bi=fi, and maximize the likelihood for these

  • parameters. We then solve this equation for the fi in

terms of the other parameters.

logℒ = W =

N

i=1

tsmi + (ts + tb)fi − Si log(tsmi + tsfi) − Bi log(tbfi) − Si(1 − log Si) − Bi(1 − logBi)

This is a profile-likelihood where we have pre-maximized

  • r “profiled out” the dependence on the unknown

background model.

slide-10
SLIDE 10

WE’VE ESSENTIALLY ADDED A PARAMETER FOR EVERY BIN!

CAUTION!!!!!!!!!!

POISSON STATISTICS WITH BACKGROUNDS

slide-11
SLIDE 11

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + POISSON BACKGROUND

One must rebin the spectrum to have at least ONE background count per bin. For a very insightful demonstration with code, see https://giacomov.github.io/Bias-in-profile-poisson-likelihood/

slide-12
SLIDE 12

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + GAUSSIAN BACKGROUND

Observation

Image via Vianello (2018)
slide-13
SLIDE 13

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + GAUSSIAN BACKGROUND

Observation

Image via Vianello (2018)
slide-14
SLIDE 14

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + GAUSSIAN BACKGROUND

Background model Observation

Image via Vianello (2018)
slide-15
SLIDE 15

π(Si, Bi|mi, bi ; σBi, ts, tb) =

N

i=1

(ts(mi + bi))Sie−ts(mi+bi) Si! × e

− 1

2( tbbi − Bi σBi

)

2

2πσBi

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + GAUSSIAN BACKGROUND

Background model Observation

total rate model background uncertainty total counts source rate model background rate model

slide-16
SLIDE 16

π(Si, Bi|mi, bi ; σBi, ts, tb) =

N

i=1

(ts(mi + bi))Sie−ts(mi+bi) Si! × e

− 1

2( tbbi − Bi σBi

)

2

2πσBi

POISSON STATISTICS WITH BACKGROUNDS

POISSON OBSERVATION + GAUSSIAN BACKGROUND

Background model Observation

total rate model background uncertainty total counts source rate model background rate model

slide-17
SLIDE 17

POISSON STATISTICS WITH BACKGROUNDS

SUBTRACTION??

In none of these cases did we try to subtract the background measurement/model from the data! We fully modeled the data generating process.

WHY?

The difference of two Poisson distributions is not Poisson (in fact it is called a Skellam distribution). Thus, if we subtract a background and then apply the Poisson distribution, our inferences and distributional properties will be destroyed. This also applies to the “choice” of using a Gaussian likelihood with background subtracted data.

slide-18
SLIDE 18

POISSON STATISTICS WITH BACKGROUNDS

SIGNIFICANCE

S = NS ̂ σ (NS) = Non − αNoff Non + α2Noff S = NS ̂ σ (NS) = Non − αNoff α (Non + Noff) S = NS ̂ NB = NS αNoff S = NS σ ( ̂ NB) = NS α Noff S = NS Non S = NS NS

?

slide-19
SLIDE 19

POISSON STATISTICS WITH BACKGROUNDS

SIGNIFICANCE

slide-20
SLIDE 20

POISSON STATISTICS WITH BACKGROUNDS

SIGNIFICANCE

slide-21
SLIDE 21

POISSON STATISTICS WITH BACKGROUNDS

SIGNIFICANCE

λ = L (X|E0, ̂ Tc) L(X| ̂ E, ̂ T) = Pr (X|E0, ̂ Tc) Pr(X| ̂ E, ̂ T) λ = L (X|E0, ̂ Tc) L(X| ̂ E, ̂ T) = α 1 + α ( Non + Noff Non )

Non

1 1 + α ( Non + Noff Noff )

Nor

The proper significance for a Poisson

  • bservation on a Poisson background

S = −2 ln λ = 2 Non ln 1 + α α ( Non Non + Noff) + Noff ln (1 + α)( Noff Non + Noff)

1/2

slide-22
SLIDE 22

SED ANLYSIS

THE NORM

slide-23
SLIDE 23

SED ANLYSIS

THE NORM

Same data… but different data points?

slide-24
SLIDE 24

SED ANLYSIS

HOW WE SHOULD PROCEED

❖ The raw count spectrum is indexed in

channel energy and has units of electronic count per second.

❖ How can we use this to understand GRB

emission physics?

slide-25
SLIDE 25

SED ANLYSIS

HOW WE SHOULD PROCEED

slide-26
SLIDE 26

SED ANLYSIS

HOW WE SHOULD PROCEED

slide-27
SLIDE 27

SED ANLYSIS

HOW WE SHOULD PROCEED

Spectra are fit via a forward-folding analysis. You get back what you put in.

slide-28
SLIDE 28

SED ANLYSIS

HOW WE SHOULD PROCEED

Spectra are fit via a forward-folding analysis. You get back what you put in.

slide-29
SLIDE 29

SED ANLYSIS

HOW WE SHOULD PROCEED

Model A Model B Model B Models that appear different in vFv can be very similar in data space due to the effect of the response. This is why we must pay attention to the statistical procedures we use to fit data. For a beautiful example, see Vianello+ 2017.

slide-30
SLIDE 30

∂ ∂t ne (γ, t) = ∂ ∂γ C (γ) ne (γ, t) + Q (γ) Q(γ) ∝ γ−p ∀γ ≥ γinj C (γ) = − σT 6πmec B2γ2 Φ (w) = w∫

∞ w

K5/3(x) dx nν(ε; t) = ∫

γmax 1

dγ ne(γ, t)Φ ( 2εbcrit 3Bγ2 )

Fokker-Planck equation power law injection synchrotron cooling synchrotron emission

Standard synchrotron emission model. No bells or whistles (or SSC/IC). The model allows us to test all synchrotron cooling regimes as a parameter! Injection electron energy

γinj γcool p B

Cooling electron energy Injection spectral index Magnetic field strength The same number of parameters as the Band function

slide-31
SLIDE 31

Photosphere (thermal emission) Shocks (optically-thin emission)

α

β

Ep

In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies

The Band function

(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)

slide-32
SLIDE 32

Photosphere (thermal emission) Shocks (optically-thin emission)

α

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

α

5 10 15 20 25 SCS -2/3 FCS -3/2

In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies

The Band function

(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)

slide-33
SLIDE 33

Photosphere (thermal emission) Shocks (optically-thin emission)

α

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

α

5 10 15 20 25 SCS -2/3 FCS -3/2

In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies

The Band function

(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)

slide-34
SLIDE 34

Photosphere (thermal emission) Shocks (optically-thin emission)

α

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

α

5 10 15 20 25 SCS -2/3 FCS -3/2

In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies

The Band function

(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)

slide-35
SLIDE 35

However, more recent studies have shown it is possible to fit synchrotron emission directly to count data. Moreover, the predictions from photospheric models encompass a wide variety of alphas (Pe’er et al 2005 etc.). We need another way to infer models from the data.

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

α

5 10 15 20 25 SCS -2/3 FCS -3/2

α

Zhang et al. (2015) Burgess et al. (2014)

slide-36
SLIDE 36
slide-37
SLIDE 37

Define an auxiliary parameter from the Band function’s parameters that attempts to capture more information than alpha. If one can distinguish between emission models via the width parameter, then we have a model comparison tool.

slide-38
SLIDE 38

W Define an auxiliary parameter from the Band function’s parameters that attempts to capture more information than alpha. If one can distinguish between emission models via the width parameter, then we have a model comparison tool.

slide-39
SLIDE 39

W

θ

Define an auxiliary parameter from the Band function’s parameters that attempts to capture more information than alpha. If one can distinguish between emission models via the width parameter, then we have a model comparison tool.

slide-40
SLIDE 40

Photosphere (thermal emission) Shocks (optically-thin emission)

W The hypothesis is that thermal spectra are narrower and synchrotron spectra are very

  • broad. Thus, if one can measure the width of the

Band function, one can infer physics. Synchrotron Blackbody

slide-41
SLIDE 41

Photosphere (thermal emission) Shocks (optically-thin emission)

Axelsson & Borgonovo (2015) Yu+ (2015)

slide-42
SLIDE 42

Photosphere (thermal emission) Shocks (optically-thin emission)

Synchrotron Thermal Axelsson & Borgonovo (2015) Yu+ (2015)

slide-43
SLIDE 43

Photosphere (thermal emission) Shocks (optically-thin emission)

Synchrotron is once again strongly ruled out!

Synchrotron Thermal Axelsson & Borgonovo (2015) Yu+ (2015)

slide-44
SLIDE 44

10−6 10−4 10−2 100 102

Net rate (counts s−1 keV−1)

BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9

101 102 103 104

Energy (keV)

−4 −2 2 4

Residuals (σ)

10−6 10−4 10−2 100 102

Net rate (counts s−1 keV−1)

BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5

101 102 103 104

Energy (keV)

−4 −2 2

Residuals (σ)

Synchrotron fits to GRB data: too wide?

slide-45
SLIDE 45 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9 101 102 103 104 Energy (keV) −4 −2 2 4 Residuals (σ) 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5 101 102 103 104 Energy (keV) −4 −2 2 Residuals (σ)

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

Synchrotron Rejected Synchrotron fits to GRB data: too wide?

slide-46
SLIDE 46 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9 101 102 103 104 Energy (keV) −4 −2 2 4 Residuals (σ) 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5 101 102 103 104 Energy (keV) −4 −2 2 Residuals (σ)

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

Synchrotron Rejected Synchrotron fits to GRB data: too wide?

slide-47
SLIDE 47 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9 101 102 103 104 Energy (keV) −4 −2 2 4 Residuals (σ) 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5 101 102 103 104 Energy (keV) −4 −2 2 Residuals (σ)

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

Synchrotron Rejected Synchrotron fits to GRB data: too wide?

slide-48
SLIDE 48 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9 101 102 103 104 Energy (keV) −4 −2 2 4 Residuals (σ) 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5 101 102 103 104 Energy (keV) −4 −2 2 Residuals (σ)

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

Synchrotron Rejected Synchrotron fits to GRB data: too wide?

slide-49
SLIDE 49 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9 101 102 103 104 Energy (keV) −4 −2 2 4 Residuals (σ) 10−6 10−4 10−2 100 102 Net rate (counts s−1 keV−1) BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5 101 102 103 104 Energy (keV) −4 −2 2 Residuals (σ)

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

GRB100131730 GRB160101030

0.1 0.2 0.3 0.4 0.5

|PPC − 0.5|

Synchrotron fits to GRB data: too wide?

Better Fit Worse Fit

slide-50
SLIDE 50

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

GRB100131730 GRB160101030

0.1 0.2 0.3 0.4 0.5

|PPC − 0.5|

Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data.

slide-51
SLIDE 51

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

GRB100131730 GRB160101030

0.1 0.2 0.3 0.4 0.5

|PPC − 0.5|

Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data.

slide-52
SLIDE 52

1.0 1.5 2.0 2.5 3.0

W (dex)

80 100 120 140 160 180

θ (deg)

GRB100131730 GRB160101030

0.1 0.2 0.3 0.4 0.5

|PPC − 0.5|

Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data.

Models that look very different in vFv space can be very similar in count space.

slide-53
SLIDE 53

The Band function is not a proxy for synchrotron!

Band function predicting narrower curvature of the data Synchrotron also a good fit to the data

slide-54
SLIDE 54

SED ANALYSIS

SUMMARY

SEDs must be fit in their native data space! When combining measurements from different instruments, we must fold the model through each instrument’s response, and compute the likelihood appropriate for those instruments.

ℒtotal =

N

i=1

ℒi

slide-55
SLIDE 55

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

A simple hypothesis is one where specific values of are assumed. We commonly refer to this as a nested model

  • f a more complex or composite

hypothesis

θ H(x; θ1, θ2) = θ1 + θ2x

G(x; θ1, θ2, θ3) = θ1 + θ2x + θ3x2 composite: simple:

slide-56
SLIDE 56

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

A simple hypothesis is one where specific values of are assumed. We commonly refer to this as a nested model

  • f a more complex or composite

hypothesis

θ H(x; θ1, θ2) = θ1 + θ2x

G(x; θ1, θ2, θ3) = θ1 + θ2x + θ3x2 composite: simple:

slide-57
SLIDE 57

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

Assume with a distribution function f which forms a distance measure between data x for a set

  • f parameters

θ

f (x, θ1, θ2, ⋯θh)

P =

n

α=1

f (xα, θ1, θ2, …θh)

λ = Pω (On) PΩ (On)

H is said to be true if it generates On. Let be the set of all simple hypotheses and be a specific subset of these simple

  • hypotheses. For a set of data On

we can write the likelihood ratio

  • f a composite hypothesis H to a

simple hypothesis.

Ω ω

slide-58
SLIDE 58

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

slide-59
SLIDE 59

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

slide-60
SLIDE 60

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

Assumptions The parameter values maximize the likeihood The distribution of the likelihood (the covariance matrix) is symmetric λ = Pω (On) PΩ (On) = e− 1

2 χ2 0(1 + O(1/

n))

−2 log λ = χ2

slide-61
SLIDE 61

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

slide-62
SLIDE 62

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

Why do we want to do this? We would like to be able to establish the “significance” of adding complexity to our model to avoid over-fitting. If we can read this probability from a chi2 table, the work is simple. Let’s try it out.

slide-63
SLIDE 63

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

Let’s simulate some data from a second order polynomial with heteroskedastic, Gaussian error. data model

slide-64
SLIDE 64

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

We can fit the data via MLE to a first order polynomial (or a line for the layman) and a second

  • rder polynomial.

We can compute the likelihood ratio between the two fits. In this case, we get a value of

  • . This

corresponds to .

−2 log λ ≃ 13.7 χ2 ≃ 10−4

slide-65
SLIDE 65

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

To test this the theorem, we can: 1) generate new datasets from our best fit simple model (the line) 2) fit each data set with both models 3) compute the LRT of each fit 4) see if the LRT is distributed like a 5) Compare with or reference LRT

χ2

We can see that for such an idealistic case, Wilks’ theorem holds! This will not always be true!

slide-66
SLIDE 66

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

A power law with an exponential cutoff, and a power law background. Can we measure the cutoff?

slide-67
SLIDE 67

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

Wilks’ Theorem breaks down!

slide-68
SLIDE 68

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

WILKS’ THEOREM

slide-69
SLIDE 69

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

COMPONENT DETECTION

slide-70
SLIDE 70

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

COMPONENT DETECTION

slide-71
SLIDE 71

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

COMPONENT DETECTION

slide-72
SLIDE 72

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

COMPONENT DETECTION

slide-73
SLIDE 73

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

COMPONENT DETECTION

“In practice, this may mean that in cases where the continuum is extremely well constrained by the data and the width and position of the possible line are known, the LRT or F-test could underestimate the true significance by about a factor of 2, but there is no guarantee that this will occur in real data; particularly when the continuum is not well constrained, the true significance can be underestimated or overestimated.”

slide-74
SLIDE 74

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS

SUMMARY

CALIBRATE!

slide-75
SLIDE 75

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

LET’S TALK ABOUT REDUCED

χ2

slide-76
SLIDE 76

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM

We typically think of DOF K = N - P for N data points and P parameters. However, this is only true for linear models.

f( ⃗ x , ⃗ θ ) = θ1B1( ⃗ x ) + θ2B2( ⃗ x ) + … + θPBP( ⃗ x ) =

P

p=1

θpBp( ⃗ x )

If we define our measurements as

⃗ y = (y1, y2, …, yN)

T

χ2 = ( ⃗ y − X ⋅ ⃗ θ )T ⋅ Σ−1 ⋅ ( ⃗ y − X ⋅ ⃗ θ )

Then we have our normal distance measure

slide-77
SLIDE 77

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM

∂χ2 ∂θp = 0 ∀p = 1,2,…, P

̂ ⃗ θ = (XT ⋅ Σ−1 ⋅ X)

−1 ⋅ XT ⋅ Σ−1 ⋅

⃗ y

̂ ⃗ y = X ⋅ ̂ ⃗ θ = X ⋅ (XT ⋅ Σ−1 ⋅ X)

−1 ⋅ XT ⋅ Σ−1 ⋅

⃗ y = H ⋅ ⃗ y Next we maximize giving us our best parameters which leads us to our latent true data

slide-78
SLIDE 78

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM

Peff = tr(H) =

N

n=1

Hnn = rank(X)

The number of degrees of freedom is not simply the number of free parameters!

slide-79
SLIDE 79

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM f(x) = A cos(Bx + C) + D cos(Ex + F)

How many free parameters are there?

slide-80
SLIDE 80

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM f(x) = A cos(Bx + C) + D cos(Ex + F)

How many free parameters are there?

slide-81
SLIDE 81

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM f(x) = A cos(Bx + C) + D cos(Ex + F)

How many free parameters are there? The number of DOF can change during the fit. Thus, if in some region

  • f the posterior / likelihood profile, A is close to zero, the DOF is not a

fixed quantity!

slide-82
SLIDE 82

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

DEGREES OF FREEDOM

For even seemingly simple functions, reduced can lead to big problems in inferring if a model is correct In x-ray spectra, we deal with complicated non-linear functions. Thus, we should never try to utilize this measure as indicator of fit quality. Moreover, are data are Poisson distributed! We can always perform parametric bootstraps as we did the the LRT to examine the distribution of our statistics, compare it to the value achieved in our observed data, and determine if it is an extreme value.

χ2

slide-83
SLIDE 83

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

CAUTION

Even with parametric bootstraps, the distribution of the statistic is not always a good indicator of fit quality!

slide-84
SLIDE 84

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

Latent value: The true value of an

  • bserved datum

x latent x observed π(xobserved|xlatent)

slide-85
SLIDE 85

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

RESIDUALS

Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.

slide-86
SLIDE 86

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

RESIDUALS

Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.

slide-87
SLIDE 87

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)

slide-88
SLIDE 88

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)

posterior

slide-89
SLIDE 89

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)

posterior likelihood

slide-90
SLIDE 90

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)

posterior likelihood replicated data

slide-91
SLIDE 91

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)

posterior likelihood measured data replicated data

slide-92
SLIDE 92

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

101 102 103

Energy [keV]

10−1 100 101

Rate [cnt s−1 keV−1]

na 101 102 103

Energy [keV]

10−1 100 101 n2 103 104

Energy [keV]

10−3 10−2 10−1 100 b1

Replicated data percentiles Observed data PPCs express the volume in the posterior and the likelihood. Residuals only contain the information about the distance from data to model at one (non-unique) location on a surface.

slide-93
SLIDE 93

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

Let’s examine fitting a line that has Poisson counts.

slide-94
SLIDE 94

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

We will fit the data with the appropriate Poisson likelihood using HMC.

slide-95
SLIDE 95

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

PPCs are richer than residuals!

slide-96
SLIDE 96

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

slide-97
SLIDE 97

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS

PPCS

In general, fit quality is an area of active research in statistics. There is no “cookbook” that can be generically applied. Each analysis problem presents a different challenge. Consult the statistical literature, state your assumptions, and make your analysis reproducible!

slide-98
SLIDE 98

STACKING

COMPRESSING DATA

slide-99
SLIDE 99

STACKING

COMPRESSING DATA

slide-100
SLIDE 100

STACKING

COMPRESSING DATA

slide-101
SLIDE 101

STACKING

COMPRESSING DATA

slide-102
SLIDE 102

STACKING

COMPRESSING DATA

N

i

slide-103
SLIDE 103

STACKING

COMPRESSING DATA

N

i

Complete pooling Pros: data are powerful Cons: loss of information

slide-104
SLIDE 104

STACKING

COMPRESSING DATA

Partial pooling Pros: fit the full model Cons: requires specialized algorithms

slide-105
SLIDE 105

STACKING

WHERE DO WE ENCOUNTER HIERARCHICAL MODELING?

EVERYWHERE!

slide-106
SLIDE 106

STACKING

WHERE DO WE ENCOUNTER HIERARCHICAL MODELING?