gravitational wave data analysis ii model selection and
play

Gravitational Wave Data Analysis: II. Model Selection and Parameter - PowerPoint PPT Presentation

Gravitational Wave Data Analysis: II. Model Selection and Parameter Estimation Chris Van Den Broeck Kavli RISE Summer School on Gravitational Waves, Cambridge, UK, 23-27 September 2019 Bayesian inference Aim: use available data to


  1. Gravitational Wave Data Analysis: II. Model Selection and Parameter Estimation Chris Van Den Broeck Kavli RISE Summer School on Gravitational Waves, Cambridge, UK, 23-27 September 2019

  2. Bayesian inference Ø Aim: use available data to § Evaluate which out of several hypotheses is the most likely: Model selection § Construct probability density distribution for parameters associated with hypotheses: Parameter estimation Ø Do this while making explicit all assumptions made

  3. Probabilities of propositions Ø Propositions (or statements) denoted by uppercase letters: A, B, C, . . . , X Ø Boolean algebra: Conjunction : and are both true § A B A ∧ B Disjunction : At least one of or is true § A B A ∨ B Negation : is false § A ¬ A Implication : From follows § B A A ⇒ B

  4. Probabilities of propositions Ø Useful to view propositions as sets which are subsets of a “Universe” Conjunction: intersection of sets § A ∧ B Disjunction: union of sets § A ∨ B Negation: complement within Universe § ¬ A Ø Each of these sets have a probability associated with them If then § A ⊂ B p B p ( A ) ≤ p ( B ) If and are disjoint then § A B p ( A ∨ B ) = p ( A ) + p ( B ) The Universe has probability 1, so that e.g. § p ( A ) + p ( ¬ A ) = 1

  5. Bayes’ theorem Ø Conditional probability : p ( A | B ) ≡ p ( A ∧ B ) p ( B ) … from which follows the product rule : p ( A ∧ B ) = p ( A | B ) p ( B ) … and from the product rule follows Bayes’ theorem : p ( A | B ) = p ( B | A ) p ( A ) p ( B )

  6. Marginalization Ø Note that for any and , A B and A ∧ B B A ∧ ( ¬ B ) are disjoint sets whose union is , so that A p ( A ) = p ( A ∧ B ) + p ( A ∧ ( ¬ B )) Ø Consider sets such that { B k } They are disjoint: § B k ^ B l = ; k 6 = l X They are exhaustive: is the Universe, so that § p ( B k ) = 1 ∨ k B k k Then one has X p ( A ) = p ( A ∧ B k ) k Marginalization rule

  7. Marginalization over a continuous variable Ø Consider the proposition “The continuous variable has the value “ x α Then the probability might be zero p ( x = α ) Ø Instead assign probabilities to finite intervals: Z x 2 p ( x 1 ≤ x ≤ x 2 ) = pdf( x ) dx x 1 where is called the probability density function pdf( x ) " Exhaustiveness given by § Z x max pdf( x ) dx = 1 x min Ø Marginalization for continuous variables : Z x max p ( A ) = pdf( A, x ) dx x min

  8. Application to gravitational wave data analysis Ø The template banks we use to search for signals from coalescing binaries is coarse at high masses Ø Information about angles and distance enter through the waveform amplitude, hence matched filtering with a normalized template only involves intrinsic parameters (masses, spins) ✓ S ( h (¯ ◆ θ i ) | s ) = max ( h (¯ θ i ) | h (¯ p N i θ i )) max Fast sky position estimates instead come from different arrival times and § phases at the different detectors a network Ø After detection has taken place, we will want information about all parameters Binary black holes: 15 parameters § { m 1 , m 2 , ~ S 1 , ~ S 2 , ↵ , � , ◆ , , d L , t c , ' c } Binary neutron stars: 17 parameters § { m 1 , m 2 , ~ S 1 , ~ S 2 , ↵ , � , ◆ , , d L , t c , ' c , Λ 1 , Λ 2 }

  9. Application to gravitational wave data analysis Ø Parameter estimation : find the posterior probability density p (¯ θ | d, H ) where ¯ are the parameters § θ = ( θ 1 , θ 2 , . . . , θ N ) is the hypothesi s that e.g. the signal was from the inspiral of two § H ) + h (¯ neutron stars, which comes with a family of waveforms θ ; t ) are the detector data § d ( t ) = n ( t ) + h (¯ θ ; t ) Ø Model selection : compare different hypotheses though an odds ratio H 2 = p ( H 1 | d ) O H 1 p ( H 2 | d ) where The hypotheses , correspond to different waveform models § H 1 H 2 Binary neutron star versus binary black hole § Waveform predicted by general relativity versus alternative theory of gravity § … § The probabilities (not probability densities) , § p ( H 1 | d ) p ( H 2 | d ) do not involve any statement about parameters

  10. 1. Parameter estimation Ø Using Bayes’ theorem: θ |H , d ) = p ( d |H , ¯ θ ) p (¯ θ |H ) p (¯ p ( d |H ) where ) = p ( d |H , ¯ is called the likelihood § θ ) θ ) p (¯ ¯ |H θ |H ) is the prior probability density § p ( d |H ) the evidence for the hypothesis § θ ) p (¯ ¯ Ø The prior probability density is a function we will choose ourselves, based θ |H ) on what we know about them prior to the measurement: If the hypothesis is binary neutron star inspiral, then we can take the prior on the § component masses to be uniform in the interval [1 , 3] M � § For sources that roughly uniformly distributed over spatial volume, we take distance p ( r ) dr ∝ r 2 dr prior The prior for all parameters together is usually taken to be the product of priors for the § individual parameters |H Ø The evidence is not important here; it is set by the requirement that the p ( d |H ) p (¯ posterior probability density be normalized θ |H , d ) ) = p ( d |H , ¯ Ø The likelihood is something we can calculate! θ )

  11. 1. Parameter estimation ) = p ( d |H , ¯ Ø How to calculate the likelihood ? θ ) Ø One has d ( t ) = n ( t ) + h (¯ θ ; t ) In the conditional probability density above, the hypotheses and parameter § values are assumed known, hence is assumed known h (¯ θ ; t ) We have a probability distribution for noise realizations! § Ø Assuming stationary, Gaussian noise, n ( f ) | 2 R ∞ | ˜ p [ n ] = N e − 2 Sn ( f ) d f 0 Z ∞ A ∗ ( f ) ˜ ˜ B ( f ) or in terms of the noise-weighted inner product : ( A | B ) = 4 < d f S n ( f ) p [ n ] = N e − 1 2 ( n | n ) 0 Ø But in our case we can write , which gives us n ( f ) = ˜ d ( f ) − ˜ h (¯ ˜ θ ; f ) p ( d |H , ¯ θ ) = N e − 1 2 ( d − h | d − h ) Ø We now have all we need to calculate the posterior probability density of the parameters: θ |H , d ) = p ( d |H , ¯ θ ) p (¯ θ |H ) p (¯ p ( d |H )

  12. 1. Parameter estimation p (¯ θ | d, H ) ∝ p ( d | ¯ θ , H ) p (¯ θ |H ) Ø The posterior is the likelihood weighted by the prior Conclusions drawn are based on: Experimental data obtained (likelihood) § Information available before experiment (prior) § Ø If we want posterior distribution for just one variable then we marginalize θ 1 over all the others: Z θ max Z θ max 2 N p ( θ 1 | d, H ) = . . . p ( θ 1 , θ 2 , . . . , θ N ) d θ 2 . . . d θ N θ min θ min 2 N

  13. 2. Model selection Ø Suppose we want to compare two hypotheses , H 1 H 2 Binary neutron star versus binary black hole § Waveform predicted by general relativity versus alternative theory of gravity § … § Ø Want to compare probabilities and p ( H 1 | d ) p ( H 2 | d ) Ø Bayes theorem for e.g. : H 1 p ( H 1 | d ) = p ( d |H 1 ) p ( H 1 ) p ( d ) Ø Define odds ratio : H 2 = p ( H 1 | d ) O H 1 p ( H 2 | d ) = p ( d |H 1 ) p ( H 1 ) p ( d |H 2 ) p ( H 2 ) where factors of have canceled out p ( d ) ratio of prior odds § p ( H 1 ) /p ( H 2 ) ratio of evidences § p ( d |H 1 ) /p ( d |H 2 )

  14. 2. Model selection Ø Recall from parameter estimation: θ |H , d ) = p ( d |H , ¯ θ ) p (¯ θ |H ) p (¯ p ( d |H ) or p (¯ θ | d, H ) p ( d |H ) = p ( d |H , ¯ θ ) p (¯ θ |H ) Ø Integrate both sides over all parameters: Z Z p (¯ p ( d |H , ¯ θ ) p (¯ θ | d, H ) p ( d |H ) d N θ = θ |H ) d N θ Z p (¯ Note that independent of parameter(s), and is normalized, ) p ( d |H ) θ | d, H ) hence left hand side becomes: Z Z p (¯ p (¯ θ | d, H ) p ( d |H ) d N θ = p ( d |H ) θ | d, H ) d N θ = p ( d |H ) Therefore the evidence is given by Z p ( d |H , ¯ θ ) p (¯ θ |H ) d N θ p ( d |H ) =

  15. 2. Model selection Ø Odds ratio H 2 = p ( H 1 | d ) O H 1 p ( H 2 | d ) = p ( d |H 1 ) p ( H 1 ) p ( d |H 2 ) p ( H 2 ) Ø Define Bayes factor H 2 = p ( d |H 1 ) B H 1 p ( d |H 2 ) Z p ( d |H , ¯ θ ) p (¯ Ø Evidences θ |H ) d N θ p ( d |H ) = Ø Hypotheses can have arbitrary number of free parameters Does model that fits data the best tend to give highest evidence? § If so, model with more parameters could give highest evidence even if § incorrect !

  16. Occam’s razor Ø For simplicity, compare two hypotheses of the following form: has no free parameters § X Y has one free parameter, § X Y λ Will automatically be favored over ? X Y X Y Y = p ( d | X ) p ( X ) Ø Odds ratio O X p ( d | Y ) p ( Y ) Ø Evidence for : X Y Z Z p ( d | Y ) = p ( d | λ , Y ) p ( λ | Y ) d λ " Ø For simplicity assume flat prior for : λ ∈ [ λ min , λ max ] 1 p ( λ | Y ) = λ max − λ min

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend