Lightcurves Brooke Leverton, Kevin Multani, Rachel DeGardner, - - PowerPoint PPT Presentation

lightcurves
SMART_READER_LITE
LIVE PREVIEW

Lightcurves Brooke Leverton, Kevin Multani, Rachel DeGardner, - - PowerPoint PPT Presentation

Lightcurves Brooke Leverton, Kevin Multani, Rachel DeGardner, Rachel Zilinskas, and Yao Shi with mentor David J ones Outline The Lightcurve Problem Tree Classification Methods Feature Selection Results Eclipsing Binaries Introduction


slide-1
SLIDE 1

Lightcurves

Brooke Leverton, Kevin Multani, Rachel DeGardner, Rachel Zilinskas, and Yao Shi with mentor David J ones

slide-2
SLIDE 2

Outline

The Lightcurve Problem Tree Classification Methods Feature Selection Results

slide-3
SLIDE 3

PROBLEM EM:

Classify different types of stars

Data is collected for a large number of stars. The data are reduced to features which are then used for classification.

TA TARGET: T:

Classification accuracy based on three basic features provided by Catalina Real-Time Transients Survey (CRTS)[1] is 65. 65.1% 1%

Introduction

https://www.eso.org https://www.spacetelescope.org

Pulsating Stars Eclipsing Binaries

slide-4
SLIDE 4

Solution

DAT ATA: A: Raw lightcurves with three basic features PROCESS: SS: Compute additional features (FATS package in Python) Implement algorithm for classification (randomForest in R)

slide-5
SLIDE 5

Classification Trees

A hierarchy of binary decisions to assign labels to different objects Advan antag ages: Simplistic and can be interpreted easily. Disad advan antag ages: Not very accurate and can be unstable.

slide-6
SLIDE 6

Classification Trees

X1 X2 X1> 6 Y N

slide-7
SLIDE 7

Bootstrap Aggregation (Bagging)

Tree Bagging

  • Advan

antag ages: Reduces the variance of prediction

  • Disad

advan antag ages: Trees highly correlated, causing bias

slide-8
SLIDE 8

Random Forests

Prin incip iple le: Does bagging, but also randomly selects choice of features at each decision node. This decorrelates the trees. The final class is chosen by majority voting among the trees. R P Pac ackag age ran andomForest: Helps identify the features that are most important for classification Number of features randomly selected at each node and number of trees can be altered[3]

Image Credit: http://www.synkee.com/clipart/forest-clip-art.htm

slide-9
SLIDE 9

Lightcurve Data

slide-10
SLIDE 10

Features

CRTS Fea eatures es: Mean magnitude, period, and range for each observed star. Random Forest classification accuracy is 65.1%. Fea eature e Analysis for T Time S e Ser eries es (FATS) A library coded in Python that standardizes feature extractions for time series data, such as lightcurve data. Created by Isadora Nun, Pavlos Protopapas, and many contributors[4]. The raw lightcurve are inputted and it computes more than 50 new features.

slide-11
SLIDE 11

Methodology

slide-12
SLIDE 12

Feature Importance

slide-13
SLIDE 13

Out of Bag Error Rate vs. Number of Features Used

slide-14
SLIDE 14

Selected Feature Importance

slide-15
SLIDE 15

Results

Accuracy for Star Classifications Accuracy to beat 65.1% Training data 81.43% Testing data 81.59% Secondary Goal - Eclipsing Binaries Correctly Classified as Eclipsing Binaries Accuracy to beat 67.5% Training data 89.54% Testing data 90.60%

slide-16
SLIDE 16

Moving Forward

Limitations ns and nd F Fut utur ure W Work Study was limited to periodic star classification Extension to include aperiodic stars Extend study to explore other classifiers Support Vector Machine Boosted Trees Further feature analysis for optimal combination U i d l t i th d

slide-17
SLIDE 17

References

[1] Drake, A. J ., M. J . Graham, S. G. Djorgovski, M. Catelan, A. A. Mahabal, G. Torrealba, D. GarcÃa-Ã lvarez, C. Donalek, J . L. Prieto, R. Williams, S. Larson, E. Christen Sen, V. Belokurov, S. E. Koposov, E. Beshore, A. Boattini, A. Gibbs, R. Hill, R. Kowalski, J . J ohnson, and F. Shelly. "The Catalina Surveys Periodic Variable Star Catalog." The Astrophysical J ournal Supplement Series 213.1 (2014): 9. Web. [2] Richards, J oseph W., Dan L. Starr, Nathaniel R. Butler, J oshua S. Bloom, J ohn M. Brewer, Arien Crellin-Quick, J ustin Higgins, Rachel Kennedy, and Maxime Rischard. "On Machine-Learned Classification Of Variable Stars With Sparse And Noisy Time-Series Data." The Astrophysical J ournal 733.1 (2011): 10. Web. [3] Breiman, Leo, and Adele Cutler. "Random Forests." Random Forests. N.p., n.d. Web. 17 May 2017. <https://www.stat.berkeley.edu/~breiman/RandomForests/>. [4] Nun, Isadora, Pavlos Protopapas, Brandon Sim, Ming Zhu, Rahul Dave, Nicolas Castro, and Karim Pichara. "FATS: Feature Analysis for Time Series." [1506.00010] FATS: Feature Analysis for Time Series. N.p., 31 Aug. 2015. Web. 17 May 2017.

slide-18
SLIDE 18

Special Thanks!

  • David Jones
  • Sujit Ghosh
  • Thomas Gehrmann