Hubble Catalog of variables - Presentation @Napoli Observatory - - PDF document
Hubble Catalog of variables - Presentation @Napoli Observatory - - PDF document
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/309634589 Hubble Catalog of variables - Presentation @Napoli Observatory Presentation October 2016 CITATIONS READS 0 17 1 author:
Hubble Catalog of Variables
Panagiotis Gavras
- n behalf of the HCV team
Napoli , 19 Oct 2016
Outline
- The HCV project
- Hubble Source Catalog
- Variability detection
- Pipeline
- First Results
Hubble Catalog of Variables
- 3-Year ESA funded project (2015-2018)
- Goal of the project is to develop a set of algorithms which will identify
candidate variables among the sources included in Hubble Source Catalog (HSC).
- At the end of the project we will produce the first version of Hubble Catalog of
Variables (HCV). This catalog will be ingested to MAST portal and ESA Science Archives.
- Finally the pipeline will be deployed in STScI and run regularly in order to
produce updated versions of HCV.
Hubble Source Catalog
- Hubble Source Catalog (Whitmore et al., 2016) is a catalog with the
majority of all objects ever observed by Hubble Space Telescope (HST).
- It is developed and maintained by the Space Telescope Science
Institute (STScI).
- The HSC is designed to optimise science from the Hubble Space
Telescope by combining the tens of thousands of visit-based source lists in the Hubble Legacy Archive (HLA) into a single master catalog.
HSC : What’s inside
- HSC v2.0 released 29 Sep 2016.
- 90 million sources and 383 million detections.
- Photometry from WFPC2, ACS/WFC, WFC3/UVIS, and WFC3/IR.
- In total there are 112 filter/detector combinations.
HSC : What’s inside
- The mean photometric accuracy is
better than 0.10 mag (may go down to 0.02mag).
- The absolute astrometric accuracy is
better than 0.1 arcsec.
- 91% of HSC has coverage from
Pan-STARRS, SDSS, or 2MASS.
HSC: Things one should know
- Coverage can be very non-uniform (unlike surveys such as SDSS).
- Current WFPC2, ACS/WFC and WFC3 source lists are of variable quality.
- The default is to show all HSC objects. This may include a large number of
- artifacts. Requesting Numimages > 1 (or more) should filter out many artifacts.
- Doubling: There are occasionally cases where not all the detections of the same
source are matched together into a single objects.
- Bad Images: Images taken when Hubble has lost lock on guide stars (generally after
an earth occultation) are the primary cause of bad images.
HSC: Access
- MAST Discovery portal:
https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html
- Online form :
https://archive.stsci.edu/hst/hsc_sum/search.php
- CasJobs :
http://mastweb.stsci.edu/hcasjobs/home.aspx
Variability
Variability detection methods
- Direct image comparison (transient detection)
SN1987a
Before After
Variability detection methods
- Direct image comparison (transient detection)
- Periodicity search
Variability detection methods
- Direct image comparison (transient detection)
- Periodicity search
- Lightcurve analysis using variability indexes
Variability Indexes (VI)
- They are numerical parameters characterizing the degree of variability of an
- bject.
- Different VIs are sensitive to different type of variability.
- One expect a variable to have a significant different value in some VI than
non-variables.
- 2 Types of Variability Indexes
- Scatter-based
- Correlation-based
- reduced χ2 test
- Standard deviation σ
- MAD
- Interquartile range (IQR)
- Robust Median Statistics (RoMS)
- Normalised Excess Variance σ2NXS
- Peak-to-Peak variability v
Scatter-based Variability Indexes
RoMS = (N − 1)−1
N
- i=1
|mi − median(mi)| σi . For a non-variable object, the expected value
σ2
NXS =
1 N ¯ m2
N
- i=1
[(mi − ¯ m)2 − σ2
i ].
Here we use the symbol for the n
v = (mi − σi)max − (mi + σi)min (mi − σi)max + (mi + σi)min where (m ) and (m )
See more at Sokolovsky et al, 2016
- Welch-Stetson I
- Stetson’s indexes J,K,L
- and variations… time weighted,
magnitude limited
- Consecutive same-sign deviations from
mean magnitude (CSSD)
Correlation-based Variability Indexes
measurements obtained in two filters b I =
- 1
n(n − 1)
n
- i=1
bi − ¯ b σbi vi − ¯ v σvi
- where b (v ) are the measured magnit
J =
n
- k=1 wk sgn(Pk) √|Pk|
n
- k=1 wk
where sgn is the sign fu
togram: K = 1/N
N
- i=1
- nv
nv−1 vi−¯ v σvi
- 1/N
N
- i=1
nv nv−1 vi−¯ v σvi
2 . For a Gaussian magnitude dist
L =
- π/2JK(
- w/wall)
( where ( w w ) is the ratio
See more at Sokolovsky et al, 2016
- Excursions, Ex
- Von Neumann ratio, η
- Excess Abbe value ΕΑ
- SB variability detection statistics
Correlation-based Variability Indexes
η = δ2 σ2 =
N−1
- i=1(mi+1 − mi)2/(N − 1)
N
- i=1(mi − ¯
m)2/(N − 1) . tection statistic is defined as S B = 1 NM
- M
- i=1
ri,1 σi,1 + ri,2 σi,2 + ... + ri,ki σi,ki 2 where N represents the total number of data p
Abbe value EA ≡ Asub − A where is the m
The index Ex is computed according to the equation: Ex = 2 Nscan(Nscan − 1)
Nscan−1
- i=1
Nscan
- j>i,
- mediani − median j
- σ2
i + σ2 j
where N is the number of scans, N (N 1) 2
See more at Sokolovsky et al, 2016
Οk but how do you define which source is variable?
- We have more than 90 million sources divided in Groups (targets).
- Less than 10% of the sources are variable.
- General Idea : We have a sea of constant sources and few variables
that should stand out in some variability indexes.
Οk but how do you define which source is variable?
Οk but how do you define which source is variable?
Οk but how do you define which source is variable?
Οk but how do you define which source is variable?
Οk but how do you define which source is variable?
How does a source behave in different VIs?
How does a source behave in different VIs?
Remember : Different VIs are sensitive to different type of variability
How does a source behave in different VIs? …. and different filters
Selection of Candidates (1st method)
Selection of Candidates (1st method)
Selection of Candidates (1st method)
Selection of Candidates (1st method)
Selection of Candidates (1st method)
VI Performance
- Using this selection method we evaluated the performance of each
variability index
- Completeness
- Purity
- F-Score
(2014), we compute the completeness C and pu C = Number of selected variables Total number of confirmed variables P = Number of selected variables Total number of selected candidates as well as the fidelity F -score11 which is the har
F1 = 2(C × P)/(C + P). F reaches a maximum o
VI Performance
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 35 40 45 50
χ2
red cut-off in σ
C(Fmax)=0.706 P(Fmax)=0.740 Fmax=0.723 at 10.8σ C P F 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 35 40 45 50
σw cut-off in σ
C(Fmax)=0.569 P(Fmax)=0.899 Fmax=0.697 at 6.0σ C P F 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 20 25 30 35 40 45 50
1/η cut-off in σ
C(Fmax)=0.821 P(Fmax)=0.825 Fmax=0.823 at 16.6σ C P F
VI Performance
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 50 100 150 200 250 300
F1 max N σ IQR 1/η
Selection of Candidates (2nd method)
- Principal Component Analysis (PCA) on the normalized variability indexes
- Principal component analysis (Pearson 1901) linearly and orthogonally transforms a dataset onto
a new set of un-correlated axes (the eigenvectors of the variance-covariance matrix of the data), where the data variance is being high-lighted.
- These eigenvectors are called the principal components (PCs). Each observation xj of the original
data, composed of m variables, is expressed as where ai is the admixture coefficient of the principal component PCi. The coefficients ai are the data coordinates in the new axes. There exist a maximum of m PCs.
xj =
m
- i=1
aj,i · PCi
PCA on the normalized indexes
Scree plot representing the variances for the 15 most significant Principal Components as applied in the M31 Halo11 field. PC1 & PC2 for the M31 Halo11 PCA implementation in two filters.
Moretti et al., in prep
F606W F814W
Selection of Candidates (2nd method)
Moretti et al., in prep
RR Lyrae RR Lyrae candidates Eclipsing binaries Dwarf Cepheids LPV/Semiregulars Anomalous Cepheids
*
Post-AGB stars
The System
- 2 VM with Apache’s Hadoop file system
to serve a distributed file system over the two physical machines.
- Apache Spark to split a computation
tasks into subtasks and perform them
- ver several nodes.
- Apache Mesos to schedule and
- rchestrate the computation processes
- ver the spark nodes.
The pipeline-Detection Algorithm
The pipeline-Detection Algorithm
The Pipeline-Validation Algorithm
First Results
- Principal Components Analysis and variability search: a promising
combination - Moretti et al., in prep
- Stellar variability in the Key project galaxy NGC 4535 - Zoi Spetsieris
- Identification of Active Galactic Nuclei in GOODS South through
- ptical variability - Ektoras Pouliasis
Principal Components Analysis and variability search: a promising combination
- Application & evaluation of PCA in variability search
- Study of M31 fields
- Halo11
- Stream
- Disk
- Create data for HCV control sample
14.000 13.000 12.000 11.000 10.000 9.000 8.000 43.000 42.000 41.000 40.000 39.000
RA (deg) Dec (deg)
M31 Disk Stream Halo11
DSS image of M31fields
Principal Components Analysis and variability search: a promising combination
RR Lyrae RR Lyrae candidates Eclipsing binaries Dwarf Cepheids LPV/Semiregulars Anomalous Cepheids
*
Post-AGB stars PCA Confirmed Variables
Known variables from Brown et al.,2004
a1, a2 plot for the Halo11 sources.
PCA Candidate Variables
CMD of sources in M31, Halo 11.
Principal Components Analysis and variability search: a promising combination
a1, a2 plot for the Halo11 sources. CMD of sources in M31, Halo 11.
RR Lyrae RR Lyrae candidates Eclipsing binaries Dwarf Cepheids LPV/Semiregulars Anomalous Cepheids
*
Post-AGB stars
Known variables from Brown et al.,2004
89% Recovery 58 New candidates
PCA Confirmed Variables PCA Candidate Variables
Stellar variability in the Key project galaxy NGC 4535
- Re-analysis of NGC 4535 (WFPC2).
- PSF photometry with Dolphot.
- Recover the published variables (50 Cepheids,
Macri et al., 1999) and detect new variables.
- Investigate massive star variability in this Virgo
Cluster Galaxy and re-derive period-luminosity relationship.
Stellar variability in the Key project galaxy NGC 4535
Phased light curves for the Cepheids C1 and C2 with published periods by Macri et al. 1999.
The Key project galaxy NGC4535
New Variables
Identification of AGN in GOODS South through
- ptical variability
- Re-analysis of ACS/WFC with SExtractor.
- Add a field with extended sources in our control sample.
- Point-like/Extended sources separation: CI ~1.24.
- Total sample: 11862 sources with more than 3 epochs.
Identification of Active Galactic Nuclei through
- ptical variability selection
Identification of Active Galactic Nuclei through
- ptical variability selection
- 150 candidates with known
redshift after false-positive removals
- Concentration Index: 93% of
the variable candidates are extended, indicating AGN activity.
Identification of Active Galactic Nuclei through
- ptical variability selection
Future work
- Include Principal Component Analysis in the pipeline
- Test other Machine Learning methods
- Investigate other fields
- Deliver the 1st version of HCV