X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - PowerPoint PPT Presentation

α Burgess et al. (2014) Zhang et al. (2015) However, more recent studies have shown it is possible to fit synchrotron emission directly to count data. 25 SCS -2/3 FCS -3/2 20 Moreover, the predictions from photospheric models encompass a wide variety of alphas (Pe’er et al 2005 etc.). 15 10 We need another way to infer models from the data. 5 0 − 2 . 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 α

Define an auxiliary If one can distinguish parameter from the between emission Band function’s models via the width parameters that parameter, then we attempts to capture have a model more information than comparison tool. alpha.

Define an auxiliary If one can distinguish parameter from the between emission Band function’s models via the width parameters that parameter, then we attempts to capture have a model more information than comparison tool. alpha. W

Define an auxiliary If one can distinguish parameter from the between emission Band function’s models via the width parameters that parameter, then we attempts to capture have a model θ more information than comparison tool. alpha. W

Synchrotron W Photosphere The hypothesis is that thermal spectra are Shocks Blackbody (thermal emission) (optically-thin narrower and synchrotron spectra are very emission) broad. Thus, if one can measure the width of the Band function, one can infer physics.

Photosphere Shocks (thermal emission) (optically-thin emission) Axelsson & Borgonovo (2015) Yu+ (2015)

Thermal Synchrotron Photosphere Shocks (thermal emission) (optically-thin emission) Axelsson & Borgonovo (2015) Yu+ (2015)

Thermal Synchrotron Photosphere Shocks (thermal emission) (optically-thin emission) Axelsson & Borgonovo (2015) Synchrotron is once again strongly ruled out! Yu+ (2015)

Synchrotron fits to GRB data: too wide? 10 2 10 2 10 0 (counts s − 1 keV − 1 ) (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI6 Model NaI1 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO1 BGO0 NaI6 NaI1 10 − 6 NaI7 NaI2 10 − 6 NaI9 NaI5 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV)

Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 160 θ (deg) 140 120 100 Synchrotron Rejected 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

Synchrotron fits to GRB data: too wide? 10 2 10 2 (counts s − 1 keV − 1 ) 10 0 (counts s − 1 keV − 1 ) 10 0 Net rate Net rate 10 − 2 BGO1 Model BGO0 Model 10 − 2 NaI1 Model NaI6 Model NaI7 Model NaI2 Model 10 − 4 NaI9 Model NaI5 Model 10 − 4 BGO0 BGO1 NaI6 NaI1 10 − 6 NaI7 10 − 6 NaI2 NaI5 NaI9 4 Residuals Residuals 2 2 ( σ ) ( σ ) 0 0 − 2 − 2 − 4 − 4 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Energy Energy (keV) (keV) 180 GRB100131730 160 GRB160101030 θ (deg) 140 120 100 Better Fit 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | Worse Fit 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data. 180 GRB100131730 160 GRB160101030 θ (deg) 140 120 100 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

Even when the width measures would reject synchrotron, the fit is still acceptable. Thus, empirical measures can lead to improper conclusions about the data. Models that look very different in vF v space 180 can be very similar GRB100131730 in count space. 160 GRB160101030 θ (deg) 140 120 100 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 | PPC − 0 . 5 | 80 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 W (dex)

Synchrotron also a good fit to the data Band function predicting narrower curvature of the data The Band function is not a proxy for synchrotron!

SED ANALYSIS SUMMARY SEDs must be fit in their native data space! When combining measurements from different instruments, we must fold the model through each instrument’s response, and compute the likelihood appropriate for those instruments. N ∏ ℒ total = ℒ i i =1

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM A simple hypothesis is one where specific values of � are assumed. We θ commonly refer to this as a nested model of a more complex or composite hypothesis composite: G ( x ; θ 1 , θ 2 , θ 3 ) = θ 1 + θ 2 x + θ 3 x 2 simple: H ( x ; θ 1 , θ 2 ) = θ 1 + θ 2 x

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM f ( x , θ 1 , θ 2 , ⋯ θ h ) Assume with a distribution function f which forms a distance measure between data x for a set n of parameters � f ( x α , θ 1 , θ 2 , … θ h ) ∏ θ P = α =1 Let � be the set of all simple Ω hypotheses and � be a specific ω P ω ( O n ) subset of these simple λ = hypotheses. For a set of data O n P Ω ( O n ) we can write the likelihood ratio of a composite hypothesis H to a simple hypothesis. H is said to be true if it generates O n.

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Assumptions The parameter values maximize the likeihood The distribution of the likelihood (the covariance matrix) is symmetric P ω ( O n ) = e − 1 2 χ 2 λ = 0 (1 + O (1/ n )) P Ω ( O n ) − 2 log λ = χ 2 0

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Why do we want to do this? We would like to be able to establish the “significance” of adding complexity to our model to avoid over-fitting. If we can read this probability from a chi2 table, the work is simple. Let’s try it out.

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Let’s simulate some data from a second order polynomial with heteroskedastic, Gaussian error. data model

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM We can fit the data via MLE to a first order polynomial (or a line for the layman) and a second order polynomial. We can compute the likelihood ratio between the two fits. In this case, we get a value of � . This − 2 log λ ≃ 13.7 χ 2 ≃ 10 − 4 corresponds to � .

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM To test this the theorem, we can: 1) generate new datasets from our best fit simple model (the line) 2) fit each data set with both models 3) compute the LRT of each fit 4) see if the LRT is distributed like a � χ 2 We can see that for such an idealistic case, Wilks’ theorem 5) Compare with or reference LRT holds! This will not always be true!

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM A power law with an exponential cutoff, and a power law background. Can we measure the cutoff?

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS WILKS’ THEOREM Wilks’ Theorem breaks down!

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS COMPONENT DETECTION “In practice, this may mean that in cases where the continuum is extremely well constrained by the data and the width and position of the possible line are known, the LRT or F-test could underestimate the true significance by about a factor of 2, but there is no guarantee that this will occur in real data ; particularly when the continuum is not well constrained, the true significance can be underestimated or overestimated.”

WILKS’ THEOREM & LIKELIHOOD RATIO TESTS SUMMARY CALIBRATE!

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS χ 2 LET’S TALK ABOUT REDUCED

⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ ⃗ GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM We typically think of DOF K = N - P for N data points and P parameters. However, this is only true for linear models. P ∑ f ( x , θ ) = θ 1 B 1 ( x ) + θ 2 B 2 ( x ) + … + θ P B P ( x ) = θ p B p ( x ) p =1 If we define our measurements as T y = ( y 1 , y 2 , …, y N ) Then we have our normal distance measure χ 2 = ( θ ) T ⋅ Σ − 1 ⋅ ( y − X ⋅ y − X ⋅ θ )

⃗ ⃗ ⃗ GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM Next we maximize ∂ χ 2 = 0 ∀ p = 1,2,…, P ∂ θ p giving us our best parameters ̂ ⃗ − 1 ⋅ X T ⋅ Σ − 1 ⋅ θ = ( X T ⋅ Σ − 1 ⋅ X ) y which leads us to our latent true data ̂ ⃗ ̂ ⃗ − 1 ⋅ X T ⋅ Σ − 1 ⋅ θ = X ⋅ ( X T ⋅ Σ − 1 ⋅ X ) y = X ⋅ y = H ⋅ y

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM N ∑ P e ff = tr( H ) = H nn = rank( X ) n =1 The number of degrees of freedom is not simply the number of free parameters!

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM How many free parameters are there? f ( x ) = A cos( Bx + C ) + D cos( Ex + F )

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM How many free parameters are there? f ( x ) = A cos( Bx + C ) + D cos( Ex + F ) The number of DOF can change during the fit. Thus, if in some region of the posterior / likelihood profile, A is close to zero, the DOF is not a fixed quantity!

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS DEGREES OF FREEDOM For even seemingly simple functions, reduced � can lead to big problems in χ 2 inferring if a model is correct In x-ray spectra, we deal with complicated non-linear functions. Thus, we should never try to utilize this measure as indicator of fit quality. Moreover, are data are Poisson distributed ! We can always perform parametric bootstraps as we did the the LRT to examine the distribution of our statistics, compare it to the value achieved in our observed data, and determine if it is an extreme value.

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS CAUTION Even with parametric bootstraps, the distribution of the statistic is not always a good indicator of fit quality!

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS Latent value : The true value of an observed datum π ( x observed | x latent ) x observed x latent

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS RESIDUALS Poisson distributed data should have Poisson residuals! Calculating Poisson residuals is no straight forward. This is implanted in the code linked here.

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y )

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS likelihood y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS likelihood replicated data y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS likelihood replicated data y | y ) = ∫ d θ π ( ˜ π ( ˜ y | θ ) π ( θ | y ) posterior measured data

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS Replicated data percentiles Observed data PPCS Rate [cnt s − 1 keV − 1 ] 10 0 na n2 b1 10 1 10 1 10 − 1 10 0 10 0 10 − 2 10 − 1 10 − 1 10 − 3 10 1 10 2 10 3 10 1 10 2 10 3 10 3 10 4 Energy [keV] Energy [keV] Energy [keV] PPCs express the volume in the posterior and the likelihood. Residuals only contain the information about the distance from data to model at one (non-unique) location on a surface.

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS Let’s examine fitting a line that has Poisson counts.

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS We will fit the data with the appropriate Poisson likelihood using HMC.

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS PPCs are richer than residuals!

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS

GOODNESS OF FIT AND POSTERIOR PREDICTIVE CHECKS PPCS In general, fit quality is an area of active research in statistics. There is no “cookbook” that can be generically applied. Each analysis problem presents a different challenge. Consult the statistical literature , state your assumptions, and make your analysis reproducible !

STACKING COMPRESSING DATA

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - PowerPoint PPT Presentation

J. MICHAEL BURGESS - MPE X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image via Vianello (2018) POISSON STATISTICS WITH BACKGROUNDS POISSON OBSERVATION + POISSON BACKGROUND Background measurement

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Randomness in Computing L ECTURE 14 Last time Poisson distribution Poisson approximation

Probing Particle Acceleration with Probing Particle Acceleration with X-ray/Gamma X ray/Gamma

Workshop 10.6a: Poisson regression Murray Logan 12 Sep 2016 Section 1 Poisson regression

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work

15. Poisson Processes In Lecture 4, we introduced Poisson arrivals as the limiting behavior of

Simulating events: the Poisson process Michel Bierlaire michel.bierlaire@epfl.ch Transport and

X- X- -ray optics -ray optics ray optics ray optics Crystal optics Crystal optics Crystal

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

lecture 18 Recall Ray Casting (lectures 7, 8) Ray tracing is like ray casting, but now mirror

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

RooStats Lecture and Tutorials Lorenzo Moneta (CERN) Terascale Alliance School and Workshop,

Statistical Tests Amanda Stathopoulos amanda.stathopoulos@epfl.ch Transport and Mobility

Statistical Tests Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility

Introduction to (profiled) side-channel analysis Annelie Heuser In this talk back to

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence & fit

Overlap distribution in the Spherical Sherrington-Kirkpatrick model with V.-L. Nguyen, Benjamin

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS - PowerPoint PPT Presentation

J. MICHAEL BURGESS - MPE X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image via Vianello (2018) POISSON STATISTICS WITH BACKGROUNDS POISSON OBSERVATION + POISSON BACKGROUND Background measurement

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Ray Tracing Ray Tracing Ray Casting Ray Casting Ray-Surface Intersections Ray-Surface

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Randomness in Computing L ECTURE 14 Last time Poisson distribution Poisson approximation

Probing Particle Acceleration with Probing Particle Acceleration with X-ray/Gamma X ray/Gamma

Workshop 10.6a: Poisson regression Murray Logan 12 Sep 2016 Section 1 Poisson regression

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work

15. Poisson Processes In Lecture 4, we introduced Poisson arrivals as the limiting behavior of

Simulating events: the Poisson process Michel Bierlaire michel.bierlaire@epfl.ch Transport and

X- X- -ray optics -ray optics ray optics ray optics Crystal optics Crystal optics Crystal

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

lecture 18 Recall Ray Casting (lectures 7, 8) Ray tracing is like ray casting, but now mirror

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

RooStats Lecture and Tutorials Lorenzo Moneta (CERN) Terascale Alliance School and Workshop,

Statistical Tests Amanda Stathopoulos amanda.stathopoulos@epfl.ch Transport and Mobility

Statistical Tests Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility

Introduction to (profiled) side-channel analysis Annelie Heuser In this talk back to

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence &amp; fit

Overlap distribution in the Spherical Sherrington-Kirkpatrick model with V.-L. Nguyen, Benjamin

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence & fit