The Ophidia stack: a big data analy4cs framework for - - PowerPoint PPT Presentation

the ophidia stack a big data analy4cs framework for
SMART_READER_LITE
LIVE PREVIEW

The Ophidia stack: a big data analy4cs framework for - - PowerPoint PPT Presentation

The Ophidia stack: a big data analy4cs framework for Virtual Research Environments Sandro Fiore , Giovanni Aloisio Exploi(ng the EGI Federated clouds - Paas


slide-1
SLIDE 1

The ¡Ophidia ¡stack: ¡a ¡big ¡data ¡analy4cs ¡ framework ¡for ¡Virtual ¡Research ¡ Environments ¡

  • Sandro Fiore, Giovanni Aloisio

Advanced Scientific Computing Division Euro Mediterranean Center on Climate Change (CMCC) On behalf of the Ophidia Team Ophidia ¡

Exploi(ng ¡the ¡EGI ¡Federated ¡clouds ¡

  • ­‑ ¡Paas ¡& ¡SaaS ¡workshop ¡

¡ Bari, ¡Italy, ¡10-­‑13 ¡November ¡2015 ¡

slide-2
SLIDE 2

The Ophidia Project

¡

  • S. Fiore, A. D’Anca, C. Palazzo, I. Foster, D. N. Williams, G. Aloisio, “Ophidia: toward bigdata

analytics for eScience”, ICCS2013 Conference, Procedia Elsevier, Barcelona, June 5-7, 2013

Ophidia is a research effort carried out at the Euro Mediterranean Centre on Climate Change (CMCC) to address “big data” challenges, issues and requirements for climate change data analytics (GEMINA grant from Italian Ministry of Education, Universities and Research)

Ophidia http://ophidia.cmcc.it/

slide-3
SLIDE 3

Requirements and needs focus on:  Time series analysis  Data subsetting  Model intercomparison  Multimodel means  Massive data reduction  Data transformation (through array-based primitives)  Param. Sweep experiments (same task applied on a set of data)  Climate change signal  Maps generation  Ensemble analysis  Data analytics worflow support But also…  Performance  re-usability  Extensibility  Interoperability with ESGF

Data analytics requirements and use cases

slide-4
SLIDE 4

ESGF & the CMIP5 data archive

slide-5
SLIDE 5

Ophidia Architecture

¡

Front end ¡ Compute layer ¡ I/O layer ¡ I/O server instance ¡ Storage layer ¡ System catalog ¡

Array-based primitives ¡ Analytics Framework Standard interfaces Partitioning/hierarchical data mng Declarative language ¡ New storage model ¡

slide-6
SLIDE 6

Storage model (dimension-independent) & implementation

  • Array-based support and hierarchical storage
  • Parallel I/O ¡
slide-7
SLIDE 7
  • Primitives provide array-based transformation
  • A comprehensive set of primitives have been already implemented (≈100)
  • By definition, a primitive is applied to a single fragment
  • They come in the form of plugins (I/O server extensions)
  • So far, Ophidia primitives perform data reduction, sub-setting, predicates

evaluation, statistical analysis, compression, and so forth.

  • Support is provided both for byte-oriented and bit-oriented arrays
  • Plugins can be nested to get more complex functionalities
  • Compression is provided as a primitive too
  • Libraries like PetsC, GSL, C Math have been integrated

Array based primitives

slide-8
SLIDE 8

Array based primitives: OPH_BOXPLOT

¡

  • ph_boxplot(measure, "OPH_DOUBLE”)

Single chunk or fragment (input) ¡ Single chunk or fragment (output) ¡

slide-9
SLIDE 9
  • ph_boxplot(oph_subarray(oph_uncompress(measure), 1,18), "OPH_DOUBLE”)

subarray(measure, 1,18) ¡

Array based primitives: nesting feature

¡

Single chunk or fragment (input) ¡ Single chunk or fragment (output) ¡

slide-10
SLIDE 10

OPERATOR NAME OPERATOR DESCRIPTION

Operators “Data processing” – Domain-agnostic

OPH_APPLY(datacube_in, datacube_out, array_based_primitive) Creates the datacube_out by applying the array-based primitive to the datacube_in OPH_DUPLICATE(datacube_ in, datacube_out) Creates a copy of the datacube_in in the datacube_out OPH_SUBSET(datacube_in, subset_string, datacube_out) Creates the datacube_out by doing a sub-setting of the datacube_in by applying the subset_string OPH_MERGE(datacube_in, merge_param, datacube_out) Creates the datacube_out by merging groups of merge_param fragments from datacube_in OPH_SPLIT(datacube_in, split_param, datacube_out) Creates the datacube_out by splitting into groups of split_param fragments each fragment of the datacube_in OPH_INTERCOMPARISON (datacube_in1, datacube_in2, datacube_out) Creates the datacube_out which is the element-wise difference between datacube_in1 and datacube_in2 OPH_DELETE(datacube_in) Removes the datacube_in OPERATOR NAME OPERATOR DESCRIPTION

Operators “Data processing” – Domain-oriented

OPH_EXPORT_NC (datacube_in, file_out) Exports the datacube_in data into the file_out NetCDF file. OPH_IMPORT_NC (file_in, datacube_out) Imports the data stored into the file_in NetCDF file into the new datacube_in datacube

Operators “Data access”

OPH_INSPECT_FRAG (datacube_in, fragment_in) Inspects the data stored in the fragment_in from the datacube_in OPH_PUBLISH(datacube_in) Publishes the datacube_in fragments into HTML pages

Operators “Metadata”

OPH_CUBE_ELEMENTS (datacube_in) Provides the total number of the elements in the datacube_in OPH_CUBE_SIZE (datacube_in) Provides the disk space occupied by the datacube_in OPH_LIST(void) Provides the list of available datacubes. OPH_CUBEIO(datacube_in) Provides the provenance information related to the datacube_in OPH_FIND(search_param) Provides the list of datacubes matching the search_param criteria

Metadata management (sequential and parallel operators) Data processing (parallel operators, MPI & OpenMP based) ¡ Data Access (sequential and parallel operators) ¡

The analytics framework: datacube operators (about 50)

  • Import/Export

(parallel operators)

slide-11
SLIDE 11

The analytics framework: datacube operators

slide-12
SLIDE 12

Programmatic access: C & Python APIs

¡

hIps://www.youtube.com/watch?v=8pcrBXboF6U&feature=youtu.be ¡

slide-13
SLIDE 13
  • The ¡Ophidia ¡terminal ¡provides ¡an ¡effec(ve ¡and ¡lightweight ¡way ¡to ¡interact ¡with ¡the ¡Ophidia ¡server ¡
  • Bash-­‑like ¡environment ¡(commands ¡interpreter) ¡
  • Terminal ¡with ¡history ¡management, ¡auto-­‑comple(on, ¡specific ¡environment ¡variables ¡and ¡commands ¡

with ¡integrated ¡help… ¡

CLI access: the Ophidia Terminal ¡

  • Easy ¡installa(on ¡as ¡an ¡only ¡one ¡executable ¡using ¡a ¡small ¡number ¡
  • f ¡well-­‑known ¡and ¡open-­‑source ¡libraries ¡
  • Simple ¡enough ¡for ¡a ¡novice ¡and ¡at ¡the ¡same ¡(me ¡powerful ¡

enough ¡for ¡an ¡expert ¡

slide-14
SLIDE 14

CLI access: the Ophidia Terminal (provenance) ¡

slide-15
SLIDE 15

Modularity and extensibility: APIs and dynamic bindings ¡

Analytics Framework

Library Oph_analitycs_

  • perator_lib

Ophidia Operators

Driver 1 Oph_reduce_

  • perator

Driver N Oph_importnc_

  • perator

Main (new_oph_

  • perator_client)

Ophidia Primitives UDF MySQL Plugin

Plugin M Oph_subarray Plugin 1 Oph_math

Library Math Library GSL Library CDO Library PetsC Library MySQL Library MPI Library OpenMP

Ophidia I/O server

Module Query Engine Module Oph_iostorage

Ophidia I/O server Plugins

Server K Oph_OPHIDIAIO Server 1 Oph_MYSQL

Framework libraries

Library OphidiaDB Manager Library Oph_support

IO Server Manager

Library Oph_ioserver

dynamic binding dynamic binding dynamic binding dynamic binding dynamic binding Ophidia Storage Devices

Device 1 In-memory Device L WOS

dynamic binding dynamic binding dynamic binding TCP/IP Channel

slide-16
SLIDE 16

EUBrazilCC project

  •  The main objective is the creation of a federated e-infrastructure for research using a

user-centric approach (Coordinators EU-BR: I. Blanquer (UPV), F. V. Brasileiro (UFCG))  To achieve this, we need to pursue three objectives:  Adaptation of existing applications to tackle new scenarios emerging from cooperation between Europe and Brazil relevant to both regions.  Integration of frameworks and programming models for scientific gateways and complex workflows.  Federation of resources, to build up a general-purpose infrastructure comprising existing and heterogeneous resources  Data analytics workflows on heterogeneous datasets including climate, remote sensing data and observations (e.g. NetCDF, LANDSAT, LiDAR)

slide-17
SLIDE 17

VM ¡Instance ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ C ¡ S ¡ Deployment ¡A ¡ Deployment ¡B ¡ VM ¡Instance ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ S ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ C ¡ VM ¡Instance ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ C ¡ VM ¡Instance ¡… ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ C ¡ VM ¡Instance ¡n ¡ Deployment ¡C ¡ VM ¡Instance ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ S ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ C ¡ VM ¡Instance ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ C ¡ VM ¡Instance ¡… ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ I/ O ¡ VM ¡Instance ¡m ¡ VM ¡Instance ¡1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ C ¡ VM ¡Instance ¡… ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ C ¡ VM ¡Instance ¡n ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ C ¡

Cloud-based deployment scenarios

Legend ¡ Applica(on ¡Image ¡ IO ¡Component ¡ Data ¡ Compute ¡Component ¡ Server ¡Component ¡ Data ¡Image ¡

slide-18
SLIDE 18
  • Mainly addressed in the context of the EUBrazilCC project
  • Interoperability with EGI (security)

– GSI-enabled interface

  • X509v3 digital certificates support for mutual authentication

– VOMS support available – Authorization support

  • Local mode: based on ACL (single site scenarios, fine grain authZ); black list

also available

  • Global mode: based on VOMS (VO-based scenarios, coarse grain authZ)
  • Combined mode: based on a combination of the local and global modes (for

flexible scenarios mixing the benefits of the two approaches)

  • Ophidia & EGIAppDB

– VMIs soon available on the EGIAppDB (under testing)

EGI Interoperability

slide-19
SLIDE 19

INTERREG Project Italy-Greece 2007-2013

  • Use case on Operational FIre Danger preventIon
  • COSMO-ME
SSR (3h) SUBSETTING [ lat, lon ] (Apulia & Greece) EXPORT Meteo Maps Solar Radiation (SSR) every 3 hours for 3 days IMPORT (before: create container) T2M (3h) SUBSETTING [ lat, lon ] (Apulia & Greece) EXPORT Temperature (T2M) every 3 hours for 3 days IMPORT (before: create container) APPLY conversion (from K to C) TP (3h) IMPORT SUBSETTING [ lat, lon ] (Apulia & Greece) APPLY shift time dimension by 1 (fill value = 0) INTERCUBE TP in [ i-3h, i ] (cube - cube2) < cube > < cube2 > EXPORT Total Precipitation every 3 hours for 3 days (before: create container) U10M (3h) V10M (3h) SUBSETTING [ lat, lon ] (Apulia & Greece) SUBSETTING [ lat, lon ] (Apulia & Greece) INTERCUBE Compute wind speed (abs(U10M,V10M)) EXPORT IMPORT (before: create container) Wind (u10m, v10m, wind_speed) every 3 hours for 3 days IMPORT (before: create container) EXPORT EXPORT MAP GENERATION (daily maps) Meteo Maps MAP GENERATION (daily maps) Meteo Maps MAP GENERATION (daily maps) MULTICUBE merge Meteo Maps MAP GENERATION (daily maps) APPLY fire danger index (FWI) EXPORT MAP GENERATION (daily maps) Fire Danger Maps FWI Fire Danger Index APPLY conversion (from m/s to km/h) APPLY conversion (from m/s to km/h) D2M (3h) SUBSETTING [ lat, lon ] (Apulia & Greece) EXPORT Relative Humidity (RH) every 3 hours for 3 days from T2M and D2M (dew point temperature) IMPORT (before: create container) Meteo Maps MAP GENERATION (daily maps) APPLY compute saturation vapor pressure APPLY compute vapor pressure INTERCUBE Compute RH APPLY conversion (from decimals to percentages) MULTICUBE merge APPLY fire danger index (FFWI) EXPORT MAP GENERATION (daily maps) Fire Danger Maps FFWI Fire Danger Index MULTICUBE merge APPLY fire danger index (IFI) EXPORT MAP GENERATION (daily maps) Fire Danger Maps IFI Fire Danger Index REDUCTION [ time ] sum (daily TP) SUBSETTING [ time ] (T2M at noon) SUBSETTING [ time ] (RH at noon) SUBSETTING [ time ] (WS at noon) IMPORT previous day index values REDUCTION [ time ] mean (day) REDUCTION [ time ] mean (day) REDUCTION [ time ] max (day) REDUCTION [ time ] mean (day) REDUCTION [ time ] min (day) REDUCTION [ time ] mean (day) REDUCTION [ time ] max (day) REDUCTION [ time ] sum (24h-TP at noon) IMPORT previous day last 12h-TP EXPORT last 12h-TP
  • f 1st day

OFIDIA main objective is to build a cross-border operational fire danger prevention infrastructure that advances the ability of regional stakeholders across Apulia and Ioannina Regions to detect and fight forest wildfires

OFIDIA: Operational FIre Danger preventIon plAtform

  • Scientific Coordinator: Prof. Giovanni Aloisio
slide-20
SLIDE 20

Conclusions

  • ✔ Ophidia is a big data analytics framework for eScience

✔ OLAP approach for big data – multidimensional data model

✔ Multiple use cases for data analysis in different domains/contexts have been implemented

✔ Sea situational awareness, fire danger prevention, climate indicators processing, biodiversity and climate change, couple model intercomparison data analysis ¡

✔ Several deployment scenarios in the cloud have been implemented ✔ Mainly in the EUBrazilCC project, through the IM (UPV) service ✔ Link with EGI ✔ In the last year interoperability has been an important milestone) ✔ Publication of the VMIs on the EGI AppDB expected before the end of the year (final testing are ongoing)

✔ Next step will be the implementation of scientific use cases in EGI

✔ Interoperability with ESGF (ongoing activity in the ESGF-CWT WG)

slide-21
SLIDE 21

[1] G. Aloisio, S. Fiore, I. Foster, D. N. Williams , “Scientific big data analytics challenges at large scale”, Big Data and Extreme-scale Computing (BDEC), April 30 to May 01, 2013, Charleston, USA (position paper). [2] S. Fiore, G. Aloisio, I. Foster, D. N. Williams , “A software infrastructure for big data analytics”, Big Data and Extreme-scale Computing (BDEC), February 26-28, 2014, Fukuoka, Japan (position paper). [3] S. Fiore, A. D'Anca, C. Palazzo, I. Foster, Dean N. Williams, Giovanni Aloisio, “Ophidia: Toward Big Data Analytics for eScience”, ICCS 2013, June 5-7, 2013 Barcelona, Spain, Procedia Computer Science, Elsevier, pp. 2376-2385. [4] S. Fiore, C. Palazzo, A. D’Anca, I. Foster, D. N. Williams, G. Aloisio, “A big data analytics framework for scientific data management”, Workshop on “Big Data and Science: Infrastructure and Services”, IEEE International Conference on BigData 2013, October 6-9, 2013, Santa Clara, USA, pp. 1-8. [5] S. Fiore, A. D'Anca, D. Elia, C. Palazzo, I. Foster, D. Williams, G. Aloisio, "Ophidia: A Full Software Stack for Scientific Data Analytics”, proc. of the 2014 International Conference on High Performance Computing & Simulation (HPCS 2014), July 21 – 25, 2014, Bologna, Italy, pp. 343-350, ISBN: 978-1-4799-5311-0

For more info, please contact:

  • Dr. Sandro Fiore (sandro.fiore@cmcc.it)

Mailing list: ophidia-­‑info@lists.cmcc.it ¡ ¡

References

  • http://ophidia.cmcc.it ¡

@OphidiaBigData ¡ www.youtube.com/user/OphidiaBigData

slide-22
SLIDE 22

Questions?