Petr koda Astronomical Institute, Czech Academy of Sciences Ondejov - - PowerPoint PPT Presentation

petr koda
SMART_READER_LITE
LIVE PREVIEW

Petr koda Astronomical Institute, Czech Academy of Sciences Ondejov - - PowerPoint PPT Presentation

Facilitating Knowledge Discovery in Large Archives of Astronomical Spectra using Distributed Cloud-based Engine Petr koda Astronomical Institute, Czech Academy of Sciences Ondejov Jakub Koza, Andrej Palika, Ji Ndvornk and


slide-1
SLIDE 1

Facilitating Knowledge Discovery in Large Archives of Astronomical Spectra using Distributed Cloud-based Engine

Petr Škoda

Astronomical Institute, Czech Academy of Sciences Ondřejov

Jakub Koza, Andrej Palička, Jiří Nádvorník and Tomáš Peterka

Faculty of Informatics, Czech Technical University, Prague Supported by grant GAČR 13-08195S

Astroinformatics 2015 Dubrovnik, Croatia, 5th October 2015

slide-2
SLIDE 2

Concept of scientific „CLOUD“

ITERATIVE REPEATING of SAME computation (workflow) ITERATIVE REPEATING of SAME computation (workflow) Global non-linear optimization (spectra disentangling) Global non-linear optimization (spectra disentangling) Synthetic spectra (various elements, wavelength-ranges) Synthetic spectra (various elements, wavelength-ranges) Machine Learning (almost all methods) Machine Learning (almost all methods) LARGE stable INPUT data + small changing PARAMS LARGE stable INPUT data + small changing PARAMS Many runs on SAME data (tuning required) Many runs on SAME data (tuning required) Graphics visualization from postprocessed output (text) files Graphics visualization from postprocessed output (text) files Using WWW browser - supercomputing in PDA/mobil Using WWW browser - supercomputing in PDA/mobil

slide-3
SLIDE 3

IVOA Universal Worker Service (UWS)

slide-4
SLIDE 4

VO-CLOUD Architecture

VO-CLOUD (former VO-KOREL) VO-CLOUD (former VO-KOREL) Distributed engine Distributed engine MASTER MASTER (frontend) (frontend) Database of users and their experiments Database of users and their experiments Visualization Visualization Scheduling Scheduling Load balancing Load balancing WORKERS WORKERS (backend) (backend) Computation [+ output for visualization] Computation [+ output for visualization]

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

VO-CLOUD Deployment Schema

slide-8
SLIDE 8

Machine Learning of Spectra

Use case: ML of spectra profile of Halpha line (Be stars) Use case: ML of spectra profile of Halpha line (Be stars)

DEMO

Be stars Disk or envelope Rotates, Hot Origin ?????

slide-9
SLIDE 9

Machine Learning of Spectra Science case

Ondřejov 2m Perek Telescope – 1700/10 000 spectra Ondřejov 2m Perek Telescope – 1700/10 000 spectra PRE-PROCESSING PRE-PROCESSING Normalization to continuum, Cutout (SSAP+DL) Normalization to continuum, Cutout (SSAP+DL) Rebinning (same wavelegth points) + Renormalization [-1,+1] Rebinning (same wavelegth points) + Renormalization [-1,+1] (Reduction of dimensionality (wavelets, PCA, LLE...)) (Reduction of dimensionality (wavelets, PCA, LLE...)) Produces Produces feature vectors feature vectors in CSV (same length, dimensions) in CSV (same length, dimensions) MACHINE-LEARNING MACHINE-LEARNING Unified wrapper running multiple applications - same call Unified wrapper running multiple applications - same call Name-of-wrapper + parameters (json) – method as param Name-of-wrapper + parameters (json) – method as param VISUALIZATION VISUALIZATION JavaScript (dygraph, HighCharts) JavaScript (dygraph, HighCharts)

slide-10
SLIDE 10

Sources of Spectra Getting spectra Getting spectra + + store store

(restricted access – big files) (restricted access – big files) Files Files UPLOAD from given local directory (recursive) UPLOAD from given local directory (recursive) DOWNLOAD by http + index, FTP (recursive) DOWNLOAD by http + index, FTP (recursive) VOTable VOTable UPLOAD VOTable (e.g. prepared in TOPCAT - meta) UPLOAD VOTable (e.g. prepared in TOPCAT - meta) REMOTE VOTable REMOTE VOTable SSAP query + Accref SSAP query + Accref + DataLink (PUBDID + mime) + DataLink (PUBDID + mime) SAMP control - send to SPLAT SAMP control - send to SPLAT

slide-11
SLIDE 11

Machine Learning of BIG Archive?

Idea – 4.1 mil of LAMOST spectra (4 mil. in SDSS) Idea – 4.1 mil of LAMOST spectra (4 mil. in SDSS) NOT Upload data by user NOT Upload data by user (VO compatible archive) (VO compatible archive) Driven by SPECTRA LIST (votable obtained by TAP ?) Driven by SPECTRA LIST (votable obtained by TAP ?) Workers on same Workers on same hi-speed network hi-speed network as archive as archive Calling SSAP + DL always (client on GRID worker ?) Calling SSAP + DL always (client on GRID worker ?) Pre-cache ? Pre-cache ? Compute feature vectors – store for whole experiment ? Compute feature vectors – store for whole experiment ? PERSISTENT STORAGE - network FS ? PERSISTENT STORAGE - network FS ? Visualisation - needs input data (spectrum), lists from class Visualisation - needs input data (spectrum), lists from class

slide-12
SLIDE 12

Deep Learning Caffe + Big Data Layer GPU /CPU switch Will be part of VO-CLOUD soon

slide-13
SLIDE 13

Source Code

https://github.com/vodev/vocloud https://github.com/vodev/vocloud-som https://github.com/vodev/vocloud-RDF https://github.com/vodev/vocloud-preprocessing https://github.com/vodev/vocloud-deeplearning

slide-14
SLIDE 14

DEMO

http://vocloud-dev.asu.cas.cz/vocloud2