practical analytics
play

PRACTICAL ANALYTICS Tams Budavri / The Johns Hopkins University - PowerPoint PPT Presentation

PRACTICAL ANALYTICS Tams Budavri / The Johns Hopkins University 7/19/2012 Statistics Tams Budavri Of numbers Of vectors Of functions Of trees ISSAC at HIPACC 7/19/2012 Statistics Tams Budavri Description,


  1. PRACTICAL ANALYTICS Tamás Budavári / The Johns Hopkins University 7/19/2012

  2. Statistics Tamás Budavári  Of numbers  Of vectors  Of functions  Of trees ISSAC at HIPACC 7/19/2012

  3. Statistics Tamás Budavári  Description, modeling, inference, machine learning  Bayesian / Frequentist / Pragmatist ? Supervised Unsupervised Discrete Classification Clustering Continuous Regression Dimensional Reduction ISSAC at HIPACC 7/19/2012

  4. What’s Large? Tamás Budavári  VOLUME  Say >100TB today but tomorrow? Moving target…  COMPLEXITY  The raw dataset are simple unlike their derivatives  DEFINITION?  Large when you cannot apply the “usual” tools ISSAC at HIPACC 7/19/2012

  5. Data LARGE !! ISSAC at HIPACC 7/19/2012

  6. Data LARGE !! ISSAC at HIPACC 7/19/2012

  7. Large? Tamás Budavári  Sample size ISSAC at HIPACC 7/19/2012

  8. Large? Tamás Budavári  Sample size ISSAC at HIPACC 7/19/2012

  9. Large? Tamás Budavári  Dimensions  Ratio of surface/volume grows all points are lonely in high dimensions ISSAC at HIPACC 7/19/2012

  10. ISSAC at HIPACC 7/19/2012

  11. Keeping Up? Tamás Budavári  Image processing  Catalog extraction  O ( n )  What is difficult?  O ( n log n )  O ( n 2 ), … Worse w/ Moore’s law ISSAC at HIPACC 7/19/2012

  12. Fundamental Challenges Tamás Budavári  Cross-identification of sources  To assemble multicolor catalogs  Drop-outs from sky coverage  To constrain fluxes not detected  Constraining physical properties  To interpret the data ISSAC at HIPACC 7/19/2012

  13. Cross-Identification From long-tail science to the largest experiments ISSAC at HIPACC 7/19/2012

  14. Recording Observations Tamás Budavári  Astronomers drew it…  Now kids do it on SkyServer #1 by Haley  ISSAC at HIPACC

  15. Multicolor Universe Tamás Budavári ISSAC at HIPACC 7/19/2012

  16. Eventful Universe Tamás Budavári ISSAC at HIPACC 7/19/2012

  17. Cross-Identification One of the most fundamental analysis steps ISSAC at HIPACC 7/19/2012

  18. What is the Right Question? Tamás Budavári  Cross-identification is a hard problem  Computationally, Scientifically & Statistically  Need symmetric n -way solution  Need reliable quality measure  Same or not?  Distance threshold? Maximum likelihood? ISSAC at HIPACC 7/19/2012

  19. Tabletop Astronomy Tamás Budavári  Imagine the observed sky has only 6 pixels  One object : one die  Observing : rolling a die  Locality : die is loaded  Sky : a bag of dice ISSAC at HIPACC 7/19/2012

  20. Model Comparison: Same or Not? Tamás Budavári  Crossmatch : draw two dice with replacement  Same or not?  Bayes Factor is the ratio of the  Likelihood of “Same”  Likelihood of “Not”  Likelihood of a hypothesis?  Sum over all possibilities ISSAC at HIPACC 7/19/2012

  21. Model Comparison: Same or Not? Tamás Budavári  Crossmatch : draw two dice with replacement  Same or not?  Bayes Factor is the ratio of the  Likelihood of “Same”  Likelihood of “Not”  Likelihood of a hypothesis?  Sum over all possibilities ISSAC at HIPACC 7/19/2012

  22. Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012

  23. Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012

  24. Model Comparison: Same or Not? Tamás Budavári  Model for loaded dice is matrix of probabilities  E.g., loaded toward l =1  Etc. for l =2…6  2-way case  Same:  Not:  n -way: same ISSAC at HIPACC 7/19/2012

  25. Celestial Sphere Tamás Budavári  Continuous functions  General formalism  Accuracy is a density fn on sky ISSAC at HIPACC 7/19/2012

  26. Modeling the Astrometry Tamás Budavári  Astrometric precision  A simple function  Where on the sky?  Anywhere really… ISSAC at HIPACC 7/19/2012

  27. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  28. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  29. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  30. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m On the sky Astrometry NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  31. Same or Not? Tamás Budavári OR  The Bayes factor SAME  H: all observations of the same object at m On the sky Astrometry NOT  K: might be from separate objects at { m i } ISSAC at HIPACC 7/19/2012

  32. Analytic Results Tamás Budavári  Normal distribution  Flat and spherical  Gauss and Fisher  2-way results ISSAC at HIPACC

  33. Normal Distribution Tamás Budavári  Astrometric precision:  Fisher distribution:  Analytic results:  For high accuracies: ISSAC at HIPACC 7/19/2012

  34. Wikipedia: Interpretation Tamás Budavári ISSAC at HIPACC 7/19/2012

  35. Probability of a Match Same or not? ISSAC at HIPACC 7/19/2012

  36. From Priors to Posteriors Tamás Budavári  Bayes factor is the connection ISSAC at HIPACC 7/19/2012

  37. From Priors to Posteriors Tamás Budavári  Posterior probability from prior & Bayes factor  Prior probability of a match  Like dice in a bag: 1/ N and N 1  n  In general? ISSAC at HIPACC 7/19/2012

  38. From Priors to Posteriors Tamás Budavári  Different selections  Nearby / Distant  Red / Blue  But only 1 number ISSAC at HIPACC

  39. Self-Consistent Estimates Tamás Budavári  Prior has an unknown fudge-factor  Educated guess TB & Szalay (2008)  Or solve for it: ISSAC at HIPACC 7/19/2012

  40. Simulations Tamás Budavári  Mock objects  With correct clustering  U 01 values as properties 0 1  Simulated sources  Subsets: N 1 N 2  Overlap: N ★ ISSAC at HIPACC 7/19/2012

  41. Simulations Tamás Budavári  Mock objects  With correct clustering  U 01 values as properties 0 1  Simulated sources  Subsets: N 1 N 2  Overlap: N ★ ISSAC at HIPACC 7/19/2012

  42. Simulations Tamás Budavári  Quality  Multiple matches Explained by simple model of point sources! Heinis, TB, Szalay (2009) ISSAC at HIPACC 7/19/2012

  43. Proper Motion Tamás Budavári  Same hypotheses but different parameters  Just need  prior to integrate Sources from SDSS ISSAC at HIPACC 7/19/2012

  44. Proper Motion Tamás Budavári  Same hypotheses but different parameters  Just need  prior to integrate Kerekes, TB+ (2010) Sources from SDSS ISSAC at HIPACC 7/19/2012

  45. Matching Events Tamás Budavári  Streams of events in time and space  E.g., thresholded peaks in signal-to-noise (1) (x) (2) ISSAC at HIPACC 7/19/2012

  46. Dropouts from Sky Coverage ISSAC at HIPACC 7/19/2012

  47. Drawing with Equations Tamás Budavári TB, Szalay & Fekete (2010) r = 0.6  r = 0.5  ISSAC at HIPACC 7/19/2012

  48. Matching in Practice ISSAC at HIPACC 7/19/2012

  49. Open SkyQuery Tamás Budavári  Following our 1 st prototype  Successful  Not bayesian  Limitations ISSAC at HIPACC 7/19/2012

  50. SkyQuery – The 3 rd Generation Tamás Budavári  Dynamic federation of astronomy databases  Query the collection as if they were one  The 3 rd gen tool coming this summer  Cluster of machines running partitioned jobs  Proper probabilistic exec with variable errors ISSAC at HIPACC 7/19/2012

  51. SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012

  52. SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012

  53. SkyQuery Tamás Budavári  Almost pure standard SQL ISSAC at HIPACC 7/19/2012

  54. SkyQuery Tamás Budavári  Almost pure standard SQL  Added XMATCH  Verifiable  Flexible ISSAC at HIPACC 7/19/2012

  55. Tamás Budavári ISSAC at HIPACC 7/19/2012

  56.  HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  SQL pipeline  Astrometric TB & Lubow (2012) correction  Subpixel precision ISSAC at HIPACC

  57.  HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  FoF groups  Possible chains  Bayesian model selection  Chainbreaker ISSAC at HIPACC

  58.  HST Crossmatch Catalog RELEASE AT AAS Tamás Budavári  Lots of matching sources during HST’s long life TB & Lubow (2012) TB & Lubow (2012) ISSAC at HIPACC

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend