x ray spectral workshop 2019 poisson statistics with
play

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - PowerPoint PPT Presentation

J. MICHAEL BURGESS - MPE X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image via Vianello (2018) POISSON STATISTICS WITH BACKGROUNDS POISSON OBSERVATION + POISSON BACKGROUND Background measurement


  1. α Burgess et al. (2014) Zhang et al. (2015) However, more recent studies have shown it is possible to fit synchrotron emission directly to count data. 25 SCS -2/3 FCS -3/2 20 Moreover, the predictions from photospheric models encompass a wide variety of alphas (Pe’er et al 2005 etc.). 15 10 We need another way to infer models from the data. 5 0 − 2 . 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 α

  2. Define an auxiliary If one can distinguish parameter from the between emission Band function’s models via the width parameters that parameter, then we attempts to capture have a model more information than comparison tool. alpha.

  3. Define an auxiliary If one can distinguish parameter from the between emission Band function’s models via the width parameters that parameter, then we attempts to capture have a model more information than comparison tool. alpha. W

  4. Define an auxiliary If one can distinguish parameter from the between emission Band function’s models via the width parameters that parameter, then we attempts to capture have a model θ more information than comparison tool. alpha. W

  5. Synchrotron W Photosphere The hypothesis is that thermal spectra are Shocks Blackbody (thermal emission) (optically-thin narrower and synchrotron spectra are very emission) broad. Thus, if one can measure the width of the Band function, one can infer physics.

  6. Photosphere Shocks (thermal emission) (optically-thin emission) Axelsson & Borgonovo (2015) Yu+ (2015)

  7. Thermal Synchrotron Photosphere Shocks (thermal emission) (optically-thin emission) Axelsson & Borgonovo (2015) Yu+ (2015)

  8. Thermal Synchrotron Photosphere Shocks (thermal emission) (optically-thin emission) Axelsson & Borgonovo (2015) Synchrotron is once again strongly ruled out! Yu+ (2015)

  9. Synchrotron fits to GRB data: too wide? 10 2 10 2 10 0 (counts s − 1 keV − 1 ) (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI6 Model NaI1 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO1 BGO0 NaI6 NaI1 10 − 6 NaI7 NaI2 10 − 6 NaI9 NaI5 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV)

  10. Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 160 θ (deg) 140 120 100 Synchrotron Rejected 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  11. Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 160 θ (deg) 140 120 100 Synchrotron Rejected 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  12. Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 160 θ (deg) 140 120 100 Synchrotron Rejected 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  13. Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 160 θ (deg) 140 120 100 Synchrotron Rejected 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  14. Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 GRB100131730 160 GRB160101030 θ (deg) 140 120 100 Better Fit 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | Worse Fit 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  15. Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data. 180 GRB100131730 160 GRB160101030 θ (deg) 140 120 100 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  16. Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data. 180 GRB100131730 160 GRB160101030 θ (deg) 140 120 100 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  17. Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data. Models that look very different in vF v space 180 can be very similar GRB100131730 in count space. 160 GRB160101030 θ (deg) 140 120 100 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

  18. Synchrotron also a good fit to the data Band function predicting narrower curvature of the data The Band function is not a proxy for synchrotron!

  19. SED ANALYSIS SUMMARY SEDs must be fit in their native data space! When combining measurements from different instruments, we must fold the model through each instrument’s response, and compute the likelihood appropriate for those instruments. N ∏ ℒ total = ℒ i i =1

  20. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM A simple hypothesis is one where specific values of � are assumed. We θ commonly refer to this as a nested model of a more complex or composite hypothesis composite: G ( x ; θ 1 , θ 2 , θ 3 ) = θ 1 + θ 2 x + θ 3 x 2 simple: H ( x ; θ 1 , θ 2 ) = θ 1 + θ 2 x

  21. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM A simple hypothesis is one where specific values of � are assumed. We θ commonly refer to this as a nested model of a more complex or composite hypothesis composite: G ( x ; θ 1 , θ 2 , θ 3 ) = θ 1 + θ 2 x + θ 3 x 2 simple: H ( x ; θ 1 , θ 2 ) = θ 1 + θ 2 x

  22. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM f ( x , θ 1 , θ 2 , ⋯ θ h ) Assume with a distribution function f which forms a distance measure between data x for a set n of parameters � f ( x α , θ 1 , θ 2 , … θ h ) ∏ θ P = α =1 Let � be the set of all simple Ω hypotheses and � be a specific ω P ω ( O n ) subset of these simple λ = hypotheses. For a set of data O n P Ω ( O n ) we can write the likelihood ratio of a composite hypothesis H to a simple hypothesis. H is said to be true if it generates O n.

  23. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM

  24. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM

  25. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Assumptions The parameter values maximize the likeihood The distribution of the likelihood (the covariance matrix) is symmetric P ω ( O n ) = e − 1 2 χ 2 λ = 0 (1 + O (1/ n )) P Ω ( O n ) − 2 log λ = χ 2 0

  26. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM

  27. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Why do we want to do this? We would like to be able to establish the “significance” of adding complexity to our model to avoid over-fitting. If we can read this probability from a chi2 table, the work is simple. Let’s try it out.

  28. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Let’s simulate some data from a second order polynomial with heteroskedastic, Gaussian error. data model

  29. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM We can fit the data via MLE to a first order polynomial (or a line for the layman) and a second order polynomial. We can compute the likelihood ratio between the two fits. In this case, we get a value of � . This − 2 log λ ≃ 13.7 χ 2 ≃ 10 − 4 corresponds to � .

  30. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM To test this the theorem, we can: 1) generate new datasets from our best fit simple model (the line) 2) fit each data set with both models 3) compute the LRT of each fit 4) see if the LRT is distributed like a � χ 2 We can see that for such an idealistic case, Wilks’ theorem 5) Compare with or reference LRT holds! This will not always be true!

  31. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM A power law with an exponential cutoff, and a power law background. Can we measure the cutoff?

  32. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Wilks’ Theorem breaks down!

  33. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM

  34. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION

  35. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION

  36. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION

  37. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION

  38. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION “In practice, this may mean that in cases where the continuum is extremely well constrained by the data and the width and position of the possible line are known, the LRT or F-test could underestimate the true significance by about a factor of 2, but there is no guarantee that this will occur in real data ; particularly when the continuum is not well constrained, the true significance can be underestimated or overestimated.”

  39. WILKS’ THEOREM & LIKELIHOOD RATIO TESTS SUMMARY CALIBRATE!

  40. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS χ 2 LET’S TALK ABOUT REDUCED

  41. ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM We typically think of DOF K = N - P for N data points and P parameters. However, this is only true for linear models. P ∑ f ( x , θ ) = θ 1 B 1 ( x ) + θ 2 B 2 ( x ) + … + θ P B P ( x ) = θ p B p ( x ) p =1 If we define our measurements as T y = ( y 1 , y 2 , …, y N ) Then we have our normal distance measure χ 2 = ( θ ) T ⋅ Σ − 1 ⋅ ( y − X ⋅ y − X ⋅ θ )

  42. ⃗ ⃗ ⃗ GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM Next we maximize ∂ χ 2 = 0 ∀ p = 1,2,…, P ∂ θ p giving us our best parameters ̂ ⃗ − 1 ⋅ X T ⋅ Σ − 1 ⋅ θ = ( X T ⋅ Σ − 1 ⋅ X ) y which leads us to our latent true data ̂ ⃗ ̂ ⃗ − 1 ⋅ X T ⋅ Σ − 1 ⋅ θ = X ⋅ ( X T ⋅ Σ − 1 ⋅ X ) y = X ⋅ y = H ⋅ y

  43. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM N ∑ P e ff = tr( H ) = H nn = rank( X ) n =1 The number of degrees of freedom is not simply the number of free parameters!

  44. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM How many free parameters are there? f ( x ) = A cos( Bx + C ) + D cos( Ex + F )

  45. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM How many free parameters are there? f ( x ) = A cos( Bx + C ) + D cos( Ex + F )

  46. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM How many free parameters are there? f ( x ) = A cos( Bx + C ) + D cos( Ex + F ) The number of DOF can change during the fit. Thus, if in some region of the posterior / likelihood profile, A is close to zero, the DOF is not a fixed quantity!

  47. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM For even seemingly simple functions, reduced � can lead to big problems in χ 2 inferring if a model is correct In x-ray spectra, we deal with complicated non-linear functions. Thus, we should never try to utilize this measure as indicator of fit quality. Moreover, are data are Poisson distributed ! We can always perform parametric bootstraps as we did the the LRT to examine the distribution of our statistics, compare it to the value achieved in our observed data, and determine if it is an extreme value.

  48. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS CAUTION Even with parametric bootstraps, the distribution of the statistic is not always a good indicator of fit quality!

  49. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS Latent value : The true value of an observed datum π ( x observed | x latent ) x observed x latent

  50. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS RESIDUALS Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.

  51. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS RESIDUALS Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.

  52. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y )

  53. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior

  54. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS likelihood y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior

  55. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS likelihood replicated data y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior

  56. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS likelihood replicated data y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior measured data

  57. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS Replicated data percentiles Observed data PPCS Rate [cnt s − 1 keV − 1 ] 10 0 na n2 b1 10 1 10 1 10 − 1 10 0 10 0 10 − 2 10 − 1 10 − 1 10 − 3 10 1 10 2 10 3 10 1 10 2 10 3 10 3 10 4 Energy [keV] Energy [keV] Energy [keV] PPCs express the volume in the posterior and the likelihood. Residuals only contain the information about the distance from data to model at one (non-unique) location on a surface.

  58. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS Let’s examine fitting a line that has Poisson counts.

  59. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS We will fit the data with the appropriate Poisson likelihood using HMC.

  60. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS PPCs are richer than residuals!

  61. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS

  62. GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS In general, fit quality is an area of active research in statistics. There is no “cookbook” that can be generically applied. Each analysis problem presents a different challenge. Consult the statistical literature , state your assumptions, and make your analysis reproducible !

  63. STACKING COMPRESSING DATA

  64. STACKING COMPRESSING DATA

  65. STACKING COMPRESSING DATA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend