the arches cross correlation tool
play

The ARCHES cross-correlation tool cois-Xavier Pineau 1 Fran 1 - PowerPoint PPT Presentation

The ARCHES cross-correlation tool cois-Xavier Pineau 1 Fran 1 Observatoire Astronomique de Strasbourg, Universit e de Strasbourg, CNRS Paris, 1 th December, 2015 1 / 36 INTRODUCTION This talk: Cross-correlation tool development &


  1. The ARCHES cross-correlation tool cois-Xavier Pineau 1 Fran¸ 1 Observatoire Astronomique de Strasbourg, Universit´ e de Strasbourg, CNRS Paris, 1 th December, 2015 1 / 36

  2. INTRODUCTION This talk: “ Cross-correlation tool development & catalogue creation ” (WP4) Aims of ARCHES’s WP4: ◮ Create a public n -catalogues cross-correlation tool: ⋆ No magic BUT a flexible/multi-purpose/scriptable multi-catalogue xmatch engine ⋆ Usable as a building block from you own specific code ◮ Use/develop statistical methods to compute probabilities of associations: ⋆ Astrometry based probabilities only! ⋆ Can be combined with photometry based probabilities (in a further step) ◮ Use the tool to build ARCHES catalogue(s) Beyond the ARCHES project: ◮ tool will be part of the CDS XMatch Service ◮ ⇒ will be maintained, will keep evolving 2 / 36

  3. INTRODUCTION This talk: “ Cross-correlation tool development & catalogue creation ” (WP4) Aims of ARCHES’s WP4: ◮ Create a public n -catalogues cross-correlation tool: ⋆ No magic BUT a flexible/multi-purpose/scriptable multi-catalogue xmatch engine ⋆ Usable as a building block from you own specific code ◮ Use/develop statistical methods to compute probabilities of associations: ⋆ Astrometry based probabilities only! ⋆ Can be combined with photometry based probabilities (in a further step) ◮ Use the tool to build ARCHES catalogue(s) Beyond the ARCHES project: ◮ tool will be part of the CDS XMatch Service ◮ ⇒ will be maintained, will keep evolving 2 / 36

  4. INTRODUCTION This talk is mainly focused on the probabilistic part More details on the tool during the Hands on session 3 / 36

  5. METHOD Steps to probabilistic positional xmatch ◮ Make simplifying assumptions ◮ Select candidates: select and group together sources possibly being various detections of a same real source ⋆ Need for a selection criterion ◮ Make hypothesis: are the sources really from a same real sources or from different real sources? ◮ For each hypothesis: ⋆ derive the associated likelihood ⋆ derive the associated prior ◮ Compute astrometry based probabilities 4 / 36

  6. SIMPLIFYING ASSUMPTIONS Radical simplifying assumptions: ◮ No proper motions ◮ No blending ◮ No clustering (density of sources = Poisson law) ◮ No systematic offsets ◮ You can trust positional uncertainties provided in catalogues 5 / 36

  7. CANDIDATE SELECTION Candidate selection criterion How to select a group of n sources from n distinct catalogues as possibly being various observations of a same actual source? Statistical hypothesis testing ◮ H 0 (null hypothesis): all n sources are from the same real source ◮ H 1 = ¯ H 0 (alternative hypothesis): at least one source (out of n ) is spurious User input: γ , the probability to accept H 0 while it is true ◮ γ (I call it completeness) is called true negative rate ◮ we usually fix γ = 0 . 9973 (99 . 73%, value of the 3 σ rule in 1 dimensional pb) ◮ ⇔ fixing the type I error = 0.027% = proba to reject null hypothesis while it is true ◮ we (theoretically) miss 27/10 000 real association The criterion used is based on a χ 2 test of 2( n − 1) degrees of freedom Now, a few slides to explain it since it plays a role in probabilities 6 / 36

  8. CANDIDATE SELECTION Candidate selection criterion How to select a group of n sources from n distinct catalogues as possibly being various observations of a same actual source? Statistical hypothesis testing ◮ H 0 (null hypothesis): all n sources are from the same real source ◮ H 1 = ¯ H 0 (alternative hypothesis): at least one source (out of n ) is spurious User input: γ , the probability to accept H 0 while it is true ◮ γ (I call it completeness) is called true negative rate ◮ we usually fix γ = 0 . 9973 (99 . 73%, value of the 3 σ rule in 1 dimensional pb) ◮ ⇔ fixing the type I error = 0.027% = proba to reject null hypothesis while it is true ◮ we (theoretically) miss 27/10 000 real association The criterion used is based on a χ 2 test of 2( n − 1) degrees of freedom Now, a few slides to explain it since it plays a role in probabilities 6 / 36

  9. CANDIDATE SELECTION Candidate selection criterion How to select a group of n sources from n distinct catalogues as possibly being various observations of a same actual source? Statistical hypothesis testing ◮ H 0 (null hypothesis): all n sources are from the same real source ◮ H 1 = ¯ H 0 (alternative hypothesis): at least one source (out of n ) is spurious User input: γ , the probability to accept H 0 while it is true ◮ γ (I call it completeness) is called true negative rate ◮ we usually fix γ = 0 . 9973 (99 . 73%, value of the 3 σ rule in 1 dimensional pb) ◮ ⇔ fixing the type I error = 0.027% = proba to reject null hypothesis while it is true ◮ we (theoretically) miss 27/10 000 real association The criterion used is based on a χ 2 test of 2( n − 1) degrees of freedom Now, a few slides to explain it since it plays a role in probabilities 6 / 36

  10. CANDIDATE SELECTION Classical 2 catalogues case In the classical case (e.g. De Ruiter et al. 1977): ◮ Errors are independant on α and δ ◮ Source 1 has errors σ α 1 and σ δ 1 on α and δ respectively ◮ Source 2 has errors σ α 2 and σ δ 2 on α and δ respectively ◮ The normalized distance (or σ -distance) is defined by: � 1 / 2 � ∆ α 2 ∆ δ 2 r = + σ 2 δ 1 + σ 2 σ 2 α 1 + σ 2 α 2 δ 2 7 / 36

  11. CANDIDATE SELECTION Classical 2 catalogues case More generally (see e.g. Pineau et al. 2011) ◮ We assimilate locally the surface of the sphere to the Euclidian plane ◮ The positions of the 2 sources are 2 dimentional vectors: � µ 1 and � µ 2 . ◮ Errors on � µ 1 and � µ 2 are oriented ellipses defined by covariance matrices V 1 and V 2 respectively: ◮ The normallized distance becomes (vectorial form): � 1 / 2 � µ 2 ) T ( V 1 + V 2 ) − 1 ( � r = ( � µ 1 − � µ 1 − � µ 2 ) ◮ ⇒ equation of an ellipse of radius r and covariance matrix V 1 + V 2 8 / 36

  12. CANDIDATE SELECTION Classical 2 catalogues case For real associations, i.e. when H 0 is true ◮ The distribution of normalized distances is a Rayleigh distribution of scale σ = 1 Rayleigh distribution 0 . 6 xe − x 2 / 2 0 . 5 Density of probability 0 . 4 H 0 ∼ Rayleigh r 0 . 3 0 . 2 0 . 1 0 1 2 3 4 5 6 x 9 / 36

  13. CANDIDATE SELECTION Classical 2 catalogues case Fixing the completeness γ ⇔ fixing a normalized distance threshold k γ : � k γ Rayleigh ( r ) d r = γ 0 For γ = 99 . 73% (the 1D 3 σ rule) ⇒ k γ = 3 . 4395 (not 3!) So, for 2 sources from 2 distinct catalogues, the selection criterion is � 1 / 2 ≤ k γ µ 2 ) T ( V 1 + V 2 ) − 1 ( � � ( � µ 1 − � µ 1 − � µ 2 ) I.e. source 2 kept as candidate if it is inside an error ellipse of covariance matrix V = V 1 + V 2 and of radius k γ , centered around source 1. ⇒ the surface area of the acceptance region is | V 1 + V 2 | 1 / 2 π k 2 γ 10 / 36

  14. CANDIDATE SELECTION Classical 2 catalogues case Fixing the completeness γ ⇔ fixing a normalized distance threshold k γ : � k γ Rayleigh ( r ) d r = γ 0 For γ = 99 . 73% (the 1D 3 σ rule) ⇒ k γ = 3 . 4395 (not 3!) So, for 2 sources from 2 distinct catalogues, the selection criterion is � 1 / 2 ≤ k γ µ 2 ) T ( V 1 + V 2 ) − 1 ( � � ( � µ 1 − � µ 1 − � µ 2 ) I.e. source 2 kept as candidate if it is inside an error ellipse of covariance matrix V = V 1 + V 2 and of radius k γ , centered around source 1. ⇒ the surface area of the acceptance region is | V 1 + V 2 | 1 / 2 π k 2 γ 10 / 36

  15. CANDIDATE SELECTION Classical 2 catalogues case Fixing the completeness γ ⇔ fixing a normalized distance threshold k γ : � k γ Rayleigh ( r ) d r = γ 0 For γ = 99 . 73% (the 1D 3 σ rule) ⇒ k γ = 3 . 4395 (not 3!) So, for 2 sources from 2 distinct catalogues, the selection criterion is � 1 / 2 ≤ k γ µ 2 ) T ( V 1 + V 2 ) − 1 ( � � ( � µ 1 − � µ 1 − � µ 2 ) I.e. source 2 kept as candidate if it is inside an error ellipse of covariance matrix V = V 1 + V 2 and of radius k γ , centered around source 1. ⇒ the surface area of the acceptance region is | V 1 + V 2 | 1 / 2 π k 2 γ 10 / 36

  16. CANDIDATE SELECTION Now, a different version of the same story more easily generalisable to n -catalogues. 11 / 36

  17. CANDIDATE SELECTION Revisited 2 catalogues case I have 2 sources from 2 distinct catalogues, I suppose H 0 is true Maximum Likelihood Estimate (MLE) of the position of the real source � the weighted mean position µ Σ = V Σ ( V − 1 µ 1 + V − 1 � � µ 2 ) � 1 2 in which V Σ = ( V − 1 + V − 1 2 ) − 1 1 The error on this MLE is ... V Σ The result is the same with a (by block) Weighted Least Squares method We can now define the Mahalanobis distance: � 2 � 1 / 2 H 0 � µ Σ ) T V − 1 D M = ( � µ i − � ( � µ i − � µ Σ ) ∼ χ dof =2 i i =1 12 / 36

  18. CANDIDATE SELECTION Revisited 2 catalogues case I have 2 sources from 2 distinct catalogues, I suppose H 0 is true Maximum Likelihood Estimate (MLE) of the position of the real source � the weighted mean position µ Σ = V Σ ( V − 1 µ 1 + V − 1 � � µ 2 ) � 1 2 in which V Σ = ( V − 1 + V − 1 2 ) − 1 1 The error on this MLE is ... V Σ The result is the same with a (by block) Weighted Least Squares method We can now define the Mahalanobis distance: � 2 � 1 / 2 H 0 � µ Σ ) T V − 1 D M = ( � µ i − � ( � µ i − � µ Σ ) ∼ χ dof =2 i i =1 12 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend