Finding Periodicities in Astronomical Light Curves using Information - - PowerPoint PPT Presentation
Finding Periodicities in Astronomical Light Curves using Information - - PowerPoint PPT Presentation
Finding Periodicities in Astronomical Light Curves using Information Theoretic Learning Pablo Huijse H. Department of Electrical Engineering Universidad de Chile Joint work with: Pavlos Protopapas, Harvard University Jose Pr ncipe,
Introduction Methods Results Conclusions
Introduction
Statement of the problem To find periodic light curves automatically in large astronomical databases Find the period of a light curve Discriminate if it is truly periodic ... in reasonable computational time Relevance The fundamental period of light curves can be used for: Stellar classification Stellar parameter estimation Extrasolar planet detection
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Statement of the problem
Challenges Light curves are unevenly sampled and noisy Astronomical databases are huge Current situation: Period detection schemes rely too much on visual inspection. Goals To develop a fully automated, efficient and robust method for period detection and estimation based on information theoretic learning
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
ITL and Renyi’s quadratic entropy
Information theoretic learning Information theoretic concepts of Entropy and Mutual Information applied to machine learning. Replace conventional second-order metrics (variance, correlation) with IT metrics estimated directly from samples. Renyi’s quadratic entropy (RQE) Entropy quantifies uncertainty of a system. Using Parzen windows the RQE (and the PDF) is estimated directly from the sample data ˆ HR2(X) = − log +∞
−∞
p2(x)dx
- = − log (IP(X))
IP(X) = 1 N2
N
- i=1
N
- j=1
Gσ(xi − xj)
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Correntropy
Correntropy is an ITL metric that takes in account the time structure of random processes. Generalization of correlation. It measures similarities in a kernel space between samples sep- arated by a time lag τ in the input space. The autocorrentropy function:
- V (τ) =
1 N − τ + 1
N−1
- n=τ
Gσ(xn − xn−τ) Translation-invariant Gaussian kernel with kernel size σ Gσ(x − y) = 1 √ 2πσ exp
- −x − y2
2σ2
- .
σ controls the width of the kernel and it is usually selected wrt the data properties.
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Period Estimator: Slotted Correntropy
A correntropy estimator for unevenly sampled time series using the slotting technique (Edelson & Krolik, Mayo). Time lag k is defined as: k∆τ = [(k − 0.5)∆τ, (k + 0.5)∆τ].
- V (k∆τ) =
N
i=1
N
j=1 Gσ(xi − xj) · Bk∆τ(ti, tj)
N
i
N
j=1 Bk∆τ(ti, tj)
, where Bk∆τ(ti, tj) = 1 if (ti − tj) fall in slotted lag k. The bin size ∆τ has to be carefully set to avoid undefined slots Fourier transform of slotted correntropy: slotted correntropy spectrum
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Previous Work
Results of this investigation published in IEEE SPL Period estimation in light curves from the MACHO survey Gold standard provided by the Harvard TSC Slotted correntropy was compared with the LS periodogram, AoV, String Length and slotted correlation The slotted correntropy outperformed the other methods on EB period estimation, and performed equally well on RRL/Cepheid period estimation
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
New ITL based metric for period detection
Include the time structure in the kernel function Spatio-temporal kernel function
Gaussian kernel to evaluate ∆x Periodic kernel to evaluate ∆t, no folding required Multiplication of Mercer kernels is also a Mercer kernel
A periodogram based on the centered correntropy with spatio- temporal kernel function H(Pt) =
N
- i=1
N
- j=1
[Gσm(∆xij) − IP] · Gσt;Pt(∆tij), By maximizing H wrt to P we obtain the period associated to the most similar set of sample pairs The H periodogram has two free parameters: σm and σt. Kernel sizes control the observation window in which similarity is assessed.
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Results on periodic versus non-periodic discriminator
Automatic periodic light curve discrimination based on H Test on light curves from the MACHO and EROS survey We need a training dataset (EROS): We have to build one
Choose a field of the survey Obtain sets of trial periods using: H, LS, AoV periodogram, etc Visually check the folded light curves Come up with a clean training set: Future generations will be grateful
Then we can run on bigger dataset
False positive rate: Below 0.1% Careful with spurious periodicities: sidereal day, moon phase, ... Computational efficiency: 0.1 s per light curve
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
ROC curve on MACHO subset
Figure: Periodic light curve discrimination using H metric on MACHO subset, ROC curve, 966 periodic, 775 non periodic light curves, 510 non variables, α: significance periodicity test
False positives: Spurious day and moon phase periods and mis- labeled light curves
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
ROC curve on EROS subset
Figure: Periodic light curve discrimination using H metric. Preliminary results on EROS subset, 819 periodic (field of 72k light curves), 4000 non periodic light curves, θ: periodogram threshold
Training dataset: False False Negatives + False False Positives
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Efficient computation and scalability
EROS: 20 million light curves Training Field: 71937 light curves, 600 samples per light curve Description Time One light curve (CPU) 36 s Using desktop GPU (480 cores) 0.76 s Full Training dataset (with GPU) 17 h On full EROS (with GPU) 176 days! Full EROS on GPU cluter (32) 5.5 days χ2 Variability filter: Even with a very low threshold (100% TPR and big FPR), times would be reduced by half Trial period selection: LS, AoV periodogram, Correntropy, etc Code optimizations: Max. GPU occupancy Reduce complexity: FGT, Cholesky decomposition
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Conclusions
From signal processing/machine learning viewpoint: Interest- ing, relevant and challenging problem Contribution
New information theoretic criteria for periodicity detection Not used in the astronomy field before Working on fully automated and efficient analysis of large databases
Preliminary results are promising Questions?
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
There is always a period But most of the time it is something like this:
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Preliminary results
Eclipsing binary star, MACHO 1.3449.948, P = 14.0055 days
0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1
Frequency [1/days] Power Spectral Density
PSD True period (2) Period [days] (1) 7.0024 (2) 3.5012 (1) 0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1
Frequency [1/days] Correntropy Spectral Density
CSD True period (1) (2) (3) Period [days] (1) 14.0056 (2) 7.0024 (3) 3.5012
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Preliminary results
Influence of the higher order moments included in the slotted correntropy estimated through the Gaussian kernel. Gσ(x − y) = 1 √ 2πσ
∞
- k=0
(−1)k 2kσ2kk!E
- x − y2k
Even moments included Hits Multiples Misses 0 to 2 49.22% 48.70% 2.07% 0 to 4 61.66% 36.27% 2.07% 0 to 6 62.18% 35.75% 2.07% 0 to 8 64.25% 34.72% 1.04% 0 to 10 67.36% 31.61% 1.04% 0 to ∞ 73.06% 26.42% 0.52%
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL
Introduction Methods Results Conclusions
Preliminary results
Comparison between established methods in a subset of 200 periodic light curves of eclipsing binary stars drawn from the MACHO survey. Period estimation methods Hits[%] Multiples[%] Misses[%] Slotted correntropy + IP 74.0 25.5 0.5 Slotted correlation + IP 50.0 48.5 1.5 VarTools LS 11.0 89.0 0.0 VarTools LS + IP 18.0 82.0 0.0 VarTools AoV 39.5 60.5 0.0 SigSpec 11.0 88.5 0.5 SLLK 42.5 54.5 3.0 SLLK +IP 65.0 34.5 0.5
Pablo Huijse, Universidad de Chile Finding periodicities in astronomical light curves using ITL