X-RAY SPECTRAL WORKSHOP 2019
- J. MICHAEL BURGESS - MPE
X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - - PowerPoint PPT Presentation
J. MICHAEL BURGESS - MPE X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image via Vianello (2018) POISSON STATISTICS WITH BACKGROUNDS POISSON OBSERVATION + POISSON BACKGROUND Background measurement
POISSON STATISTICS WITH BACKGROUNDS
Background measurement
Image via Vianello (2018)POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
Background measurement
Image via Vianello (2018)POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
Background measurement Observation
Image via Vianello (2018)POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
Background measurement Observation
π(Si, Bi|mi, bi ; ts, tb) =
N
∏
i=1
(ts(mi + bi))Sie−ts(mi+bi) Si! × (tbbi)Bie−tbbi Bi!
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
Background measurement Observation
π(Si, Bi|mi, bi ; ts, tb) =
N
∏
i=1
(ts(mi + bi))Sie−ts(mi+bi) Si! × (tbbi)Bie−tbbi Bi!
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
Background measurement Observation
π(Si, Bi|mi, bi ; ts, tb) =
N
∏
i=1
(ts(mi + bi))Sie−ts(mi+bi) Si! × (tbbi)Bie−tbbi Bi!
total rate model total background counts total counts source rate model background rate model
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
∂ ∂bi π(Si, Bi|mi, bi ; ts, tb) = 0
If we do not have a model, we can still proceed! Assume each bin contains a piecewise background model with a parameter, bi=fi, and maximize the likelihood for these
terms of the other parameters.
logℒ = W =
N
∑
i=1
tsmi + (ts + tb)fi − Si log(tsmi + tsfi) − Bi log(tbfi) − Si(1 − log Si) − Bi(1 − logBi)
This is a profile-likelihood where we have pre-maximized
background model.
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
∂ ∂bi π(Si, Bi|mi, bi ; ts, tb) = 0
If we do not have a model, we can still proceed! Assume each bin contains a piecewise background model with a parameter, bi=fi, and maximize the likelihood for these
terms of the other parameters.
logℒ = W =
N
∑
i=1
tsmi + (ts + tb)fi − Si log(tsmi + tsfi) − Bi log(tbfi) − Si(1 − log Si) − Bi(1 − logBi)
This is a profile-likelihood where we have pre-maximized
background model.
CAUTION!!!!!!!!!!
POISSON STATISTICS WITH BACKGROUNDS
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + POISSON BACKGROUND
One must rebin the spectrum to have at least ONE background count per bin. For a very insightful demonstration with code, see https://giacomov.github.io/Bias-in-profile-poisson-likelihood/
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + GAUSSIAN BACKGROUND
Observation
Image via Vianello (2018)POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + GAUSSIAN BACKGROUND
Observation
Image via Vianello (2018)POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + GAUSSIAN BACKGROUND
Background model Observation
Image via Vianello (2018)π(Si, Bi|mi, bi ; σBi, ts, tb) =
N
∏
i=1
(ts(mi + bi))Sie−ts(mi+bi) Si! × e
− 1
2( tbbi − Bi σBi
)
2
2πσBi
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + GAUSSIAN BACKGROUND
Background model Observation
total rate model background uncertainty total counts source rate model background rate model
π(Si, Bi|mi, bi ; σBi, ts, tb) =
N
∏
i=1
(ts(mi + bi))Sie−ts(mi+bi) Si! × e
− 1
2( tbbi − Bi σBi
)
2
2πσBi
POISSON STATISTICS WITH BACKGROUNDS
POISSON OBSERVATION + GAUSSIAN BACKGROUND
Background model Observation
total rate model background uncertainty total counts source rate model background rate model
POISSON STATISTICS WITH BACKGROUNDS
SUBTRACTION??
In none of these cases did we try to subtract the background measurement/model from the data! We fully modeled the data generating process.
WHY?
The difference of two Poisson distributions is not Poisson (in fact it is called a Skellam distribution). Thus, if we subtract a background and then apply the Poisson distribution, our inferences and distributional properties will be destroyed. This also applies to the “choice” of using a Gaussian likelihood with background subtracted data.
POISSON STATISTICS WITH BACKGROUNDS
SIGNIFICANCE
S = NS ̂ σ (NS) = Non − αNoff Non + α2Noff S = NS ̂ σ (NS) = Non − αNoff α (Non + Noff) S = NS ̂ NB = NS αNoff S = NS σ ( ̂ NB) = NS α Noff S = NS Non S = NS NS
POISSON STATISTICS WITH BACKGROUNDS
SIGNIFICANCE
POISSON STATISTICS WITH BACKGROUNDS
SIGNIFICANCE
POISSON STATISTICS WITH BACKGROUNDS
SIGNIFICANCE
λ = L (X|E0, ̂ Tc) L(X| ̂ E, ̂ T) = Pr (X|E0, ̂ Tc) Pr(X| ̂ E, ̂ T) λ = L (X|E0, ̂ Tc) L(X| ̂ E, ̂ T) = α 1 + α ( Non + Noff Non )
Non
1 1 + α ( Non + Noff Noff )
Nor
The proper significance for a Poisson
S = −2 ln λ = 2 Non ln 1 + α α ( Non Non + Noff) + Noff ln (1 + α)( Noff Non + Noff)
1/2
SED ANLYSIS
THE NORM
SED ANLYSIS
THE NORM
Same data… but different data points?
SED ANLYSIS
HOW WE SHOULD PROCEED
❖ The raw count spectrum is indexed in
channel energy and has units of electronic count per second.
❖ How can we use this to understand GRB
emission physics?
SED ANLYSIS
HOW WE SHOULD PROCEED
SED ANLYSIS
HOW WE SHOULD PROCEED
SED ANLYSIS
HOW WE SHOULD PROCEED
Spectra are fit via a forward-folding analysis. You get back what you put in.
SED ANLYSIS
HOW WE SHOULD PROCEED
Spectra are fit via a forward-folding analysis. You get back what you put in.
SED ANLYSIS
HOW WE SHOULD PROCEED
Model A Model B Model B Models that appear different in vFv can be very similar in data space due to the effect of the response. This is why we must pay attention to the statistical procedures we use to fit data. For a beautiful example, see Vianello+ 2017.
∂ ∂t ne (γ, t) = ∂ ∂γ C (γ) ne (γ, t) + Q (γ) Q(γ) ∝ γ−p ∀γ ≥ γinj C (γ) = − σT 6πmec B2γ2 Φ (w) = w∫
∞ w
K5/3(x) dx nν(ε; t) = ∫
γmax 1
dγ ne(γ, t)Φ ( 2εbcrit 3Bγ2 )
Fokker-Planck equation power law injection synchrotron cooling synchrotron emission
Standard synchrotron emission model. No bells or whistles (or SSC/IC). The model allows us to test all synchrotron cooling regimes as a parameter! Injection electron energy
γinj γcool p B
Cooling electron energy Injection spectral index Magnetic field strength The same number of parameters as the Band function
Photosphere (thermal emission) Shocks (optically-thin emission)
α
β
Ep
In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies
The Band function
(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)
Photosphere (thermal emission) Shocks (optically-thin emission)
α
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0α
5 10 15 20 25 SCS -2/3 FCS -3/2In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies
The Band function
(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)
Photosphere (thermal emission) Shocks (optically-thin emission)
α
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0α
5 10 15 20 25 SCS -2/3 FCS -3/2In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies
The Band function
(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)
Photosphere (thermal emission) Shocks (optically-thin emission)
α
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0α
5 10 15 20 25 SCS -2/3 FCS -3/2In the past (and currently), we want to use the low- energy spectral index of the Band function to infer physics. The typical association is the steep alpha implies non- thermal emission and hard alpha implies
The Band function
(created by electrons undergoing shock-subphotospheric- reconncetion-cannonball processes)
However, more recent studies have shown it is possible to fit synchrotron emission directly to count data. Moreover, the predictions from photospheric models encompass a wide variety of alphas (Pe’er et al 2005 etc.). We need another way to infer models from the data.
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0α
5 10 15 20 25 SCS -2/3 FCS -3/2α
Zhang et al. (2015) Burgess et al. (2014)
Define an auxiliary parameter from the Band function’s parameters that attempts to capture more information than alpha. If one can distinguish between emission models via the width parameter, then we have a model comparison tool.
W Define an auxiliary parameter from the Band function’s parameters that attempts to capture more information than alpha. If one can distinguish between emission models via the width parameter, then we have a model comparison tool.
W
θ
Define an auxiliary parameter from the Band function’s parameters that attempts to capture more information than alpha. If one can distinguish between emission models via the width parameter, then we have a model comparison tool.
Photosphere (thermal emission) Shocks (optically-thin emission)
W The hypothesis is that thermal spectra are narrower and synchrotron spectra are very
Band function, one can infer physics. Synchrotron Blackbody
Photosphere (thermal emission) Shocks (optically-thin emission)
Axelsson & Borgonovo (2015) Yu+ (2015)
Photosphere (thermal emission) Shocks (optically-thin emission)
Synchrotron Thermal Axelsson & Borgonovo (2015) Yu+ (2015)
Photosphere (thermal emission) Shocks (optically-thin emission)
Synchrotron is once again strongly ruled out!
Synchrotron Thermal Axelsson & Borgonovo (2015) Yu+ (2015)
10−6 10−4 10−2 100 102
Net rate (counts s−1 keV−1)
BGO1 Model NaI6 Model NaI7 Model NaI9 Model BGO1 NaI6 NaI7 NaI9
101 102 103 104
Energy (keV)
−4 −2 2 4
Residuals (σ)
10−6 10−4 10−2 100 102
Net rate (counts s−1 keV−1)
BGO0 Model NaI1 Model NaI2 Model NaI5 Model BGO0 NaI1 NaI2 NaI5
101 102 103 104
Energy (keV)
−4 −2 2
Residuals (σ)
Synchrotron fits to GRB data: too wide?
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
Synchrotron Rejected Synchrotron fits to GRB data: too wide?
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
Synchrotron Rejected Synchrotron fits to GRB data: too wide?
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
Synchrotron Rejected Synchrotron fits to GRB data: too wide?
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
Synchrotron Rejected Synchrotron fits to GRB data: too wide?
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
GRB100131730 GRB160101030
0.1 0.2 0.3 0.4 0.5|PPC − 0.5|
Synchrotron fits to GRB data: too wide?
Better Fit Worse Fit
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
GRB100131730 GRB160101030
0.1 0.2 0.3 0.4 0.5|PPC − 0.5|
Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data.
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
GRB100131730 GRB160101030
0.1 0.2 0.3 0.4 0.5|PPC − 0.5|
Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data.
1.0 1.5 2.0 2.5 3.0
W (dex)
80 100 120 140 160 180
θ (deg)
GRB100131730 GRB160101030
0.1 0.2 0.3 0.4 0.5|PPC − 0.5|
Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data.
Models that look very different in vFv space can be very similar in count space.
The Band function is not a proxy for synchrotron!
Band function predicting narrower curvature of the data Synchrotron also a good fit to the data
SED ANALYSIS
SUMMARY
SEDs must be fit in their native data space! When combining measurements from different instruments, we must fold the model through each instrument’s response, and compute the likelihood appropriate for those instruments.
ℒtotal =
N
∏
i=1
ℒi
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
A simple hypothesis is one where specific values of are assumed. We commonly refer to this as a nested model
hypothesis
θ H(x; θ1, θ2) = θ1 + θ2x
G(x; θ1, θ2, θ3) = θ1 + θ2x + θ3x2 composite: simple:
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
A simple hypothesis is one where specific values of are assumed. We commonly refer to this as a nested model
hypothesis
θ H(x; θ1, θ2) = θ1 + θ2x
G(x; θ1, θ2, θ3) = θ1 + θ2x + θ3x2 composite: simple:
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
Assume with a distribution function f which forms a distance measure between data x for a set
θ
f (x, θ1, θ2, ⋯θh)
P =
n
∏
α=1
f (xα, θ1, θ2, …θh)
λ = Pω (On) PΩ (On)
H is said to be true if it generates On. Let be the set of all simple hypotheses and be a specific subset of these simple
we can write the likelihood ratio
simple hypothesis.
Ω ω
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
Assumptions The parameter values maximize the likeihood The distribution of the likelihood (the covariance matrix) is symmetric λ = Pω (On) PΩ (On) = e− 1
2 χ2 0(1 + O(1/n))
−2 log λ = χ2
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
Why do we want to do this? We would like to be able to establish the “significance” of adding complexity to our model to avoid over-fitting. If we can read this probability from a chi2 table, the work is simple. Let’s try it out.
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
Let’s simulate some data from a second order polynomial with heteroskedastic, Gaussian error. data model
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
We can fit the data via MLE to a first order polynomial (or a line for the layman) and a second
We can compute the likelihood ratio between the two fits. In this case, we get a value of
corresponds to .
−2 log λ ≃ 13.7 χ2 ≃ 10−4
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
To test this the theorem, we can: 1) generate new datasets from our best fit simple model (the line) 2) fit each data set with both models 3) compute the LRT of each fit 4) see if the LRT is distributed like a 5) Compare with or reference LRT
χ2
We can see that for such an idealistic case, Wilks’ theorem holds! This will not always be true!
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
A power law with an exponential cutoff, and a power law background. Can we measure the cutoff?
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
Wilks’ Theorem breaks down!
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
WILKS’ THEOREM
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
COMPONENT DETECTION
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
COMPONENT DETECTION
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
COMPONENT DETECTION
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
COMPONENT DETECTION
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
COMPONENT DETECTION
“In practice, this may mean that in cases where the continuum is extremely well constrained by the data and the width and position of the possible line are known, the LRT or F-test could underestimate the true significance by about a factor of 2, but there is no guarantee that this will occur in real data; particularly when the continuum is not well constrained, the true significance can be underestimated or overestimated.”
WILKS’ THEOREM & LIKELIHOOD RATIO TESTS
SUMMARY
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
LET’S TALK ABOUT REDUCED
χ2
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM
We typically think of DOF K = N - P for N data points and P parameters. However, this is only true for linear models.
f( ⃗ x , ⃗ θ ) = θ1B1( ⃗ x ) + θ2B2( ⃗ x ) + … + θPBP( ⃗ x ) =
P
∑
p=1
θpBp( ⃗ x )
If we define our measurements as
⃗ y = (y1, y2, …, yN)
T
χ2 = ( ⃗ y − X ⋅ ⃗ θ )T ⋅ Σ−1 ⋅ ( ⃗ y − X ⋅ ⃗ θ )
Then we have our normal distance measure
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM
∂χ2 ∂θp = 0 ∀p = 1,2,…, P
̂ ⃗ θ = (XT ⋅ Σ−1 ⋅ X)
−1 ⋅ XT ⋅ Σ−1 ⋅
⃗ y
̂ ⃗ y = X ⋅ ̂ ⃗ θ = X ⋅ (XT ⋅ Σ−1 ⋅ X)
−1 ⋅ XT ⋅ Σ−1 ⋅
⃗ y = H ⋅ ⃗ y Next we maximize giving us our best parameters which leads us to our latent true data
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM
Peff = tr(H) =
N
∑
n=1
Hnn = rank(X)
The number of degrees of freedom is not simply the number of free parameters!
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM f(x) = A cos(Bx + C) + D cos(Ex + F)
How many free parameters are there?
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM f(x) = A cos(Bx + C) + D cos(Ex + F)
How many free parameters are there?
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM f(x) = A cos(Bx + C) + D cos(Ex + F)
How many free parameters are there? The number of DOF can change during the fit. Thus, if in some region
fixed quantity!
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
DEGREES OF FREEDOM
For even seemingly simple functions, reduced can lead to big problems in inferring if a model is correct In x-ray spectra, we deal with complicated non-linear functions. Thus, we should never try to utilize this measure as indicator of fit quality. Moreover, are data are Poisson distributed! We can always perform parametric bootstraps as we did the the LRT to examine the distribution of our statistics, compare it to the value achieved in our observed data, and determine if it is an extreme value.
χ2
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
CAUTION
Even with parametric bootstraps, the distribution of the statistic is not always a good indicator of fit quality!
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
Latent value: The true value of an
x latent x observed π(xobserved|xlatent)
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
RESIDUALS
Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
RESIDUALS
Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)
posterior
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)
posterior likelihood
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)
posterior likelihood replicated data
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
π (˜ y|y) = ∫ dθ π (˜ y|θ) π (θ|y)
posterior likelihood measured data replicated data
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
101 102 103
Energy [keV]
10−1 100 101
Rate [cnt s−1 keV−1]
na 101 102 103
Energy [keV]
10−1 100 101 n2 103 104
Energy [keV]
10−3 10−2 10−1 100 b1
Replicated data percentiles Observed data PPCs express the volume in the posterior and the likelihood. Residuals only contain the information about the distance from data to model at one (non-unique) location on a surface.
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
Let’s examine fitting a line that has Poisson counts.
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
We will fit the data with the appropriate Poisson likelihood using HMC.
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
PPCs are richer than residuals!
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS
PPCS
In general, fit quality is an area of active research in statistics. There is no “cookbook” that can be generically applied. Each analysis problem presents a different challenge. Consult the statistical literature, state your assumptions, and make your analysis reproducible!
STACKING
COMPRESSING DATA
STACKING
COMPRESSING DATA
STACKING
COMPRESSING DATA
STACKING
COMPRESSING DATA
STACKING
COMPRESSING DATA
N
∑
i
STACKING
COMPRESSING DATA
N
∑
i
Complete pooling Pros: data are powerful Cons: loss of information
STACKING
COMPRESSING DATA
Partial pooling Pros: fit the full model Cons: requires specialized algorithms
STACKING
WHERE DO WE ENCOUNTER HIERARCHICAL MODELING?
EVERYWHERE!
STACKING
WHERE DO WE ENCOUNTER HIERARCHICAL MODELING?