GRNET eScience platform for Big Data management
Codename: orka
Monday, February 1, 2016
GRNET eScience platform for Big Data management Codename: orka - - PowerPoint PPT Presentation
GRNET eScience platform for Big Data management Codename: orka Monday, February 1, 2016 Project Vision Data-Intensive Science (store and process big data, at Petabyte scale) Scientific workflows Virtual Research Environment Data
GRNET eScience platform for Big Data management
Codename: orka
Monday, February 1, 2016
Project Vision
Petabyte scale)
– PaaS over
Big data
Hadoop project
programming paradigm
file system)
software install/maintain/scale etc.
Hadoop cluster with ~orka
– Create cluster (with configurable options) from a range of Hadoop distro’s (aka images) – Transfer your data – Submit, execute, monitor jobs – Delete cluster – Start/stop/format cluster – Scale cluster, add/remove nodes – Save cluster creation metadata for reproducibility
Hadoop cluster with ~orka
Add-ons to basic Hadoop
– Spark
– Cloudera – Hue (HDFS explorer, Oozie web editor)
– Pithos ó HDFS connector (analogous to Amazon S3 Filesystem for Hadoop)
Scientific Workflows
– Built-in in orka images
Collaborative scientific research
– Research/Project home page (portal, wiki) – Project Management – Teleconference – Digital repositories
Virtual Research Environment
Category Software stack Portal / CMS Drupal (v7.37) Wiki, blog, forum Mediawiki (v1.2.4) Project management Redmine (v3.04) Web conferencing BigBlueButton (v0.81) Digital repositories DSpace (v5.3)
Reproducible Research
describes an experiment/job
Data streams into HDFS
High-level Architecture
Technology Stack
eScience
Subsystem 1 [Orka 0.1.1] Back-End
Orka SubSystem: Technologies Overview
Front-End
Single Page Application (SPA)
ü HTML5 ü CSS 3 ü Ember JS ü Bootstrap
Command Line (CLI) API
ü OrkaAPI (Python scripts)
Web Server
ü nginx
External APIs / Technologies
ü Synnefo/kamaki ü Authentication ü Hadoop
ü Django REST F/WorkREST API
App Server
ü uWSGI Supported also, (in progress) ü RabbitMQ, Message Broker ü Celery Task Manager
Data
ü Postgres DBMS
Current state
– github.com/grnet/e-science
– escience.grnet.gr
λ
lambda.grnet.gr 2
The lambda architecture
a
a useful framework to think about designing big data applications a robust framework for ingesting real-time streams of data while providing efficient stream and batch analytics. fault-tolerant against both hardware failures and human errors
b c
serves a wide range of use cases, and in which low-latency reads and updates are required
d
λ
lambda.grnet.gr 3
The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) pre-computing arbitrary query functions, called batch views.
Batch Layer
The serving layer indexes the batch views so that they can be queried in low-latency, ad- hoc way.
Serving layer
The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
Speed layer
The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in realtime by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer.
λ
lambda.grnet.gr 4
d a t a
an example
d a t a
batch layer master dataset serving layer
batch view batch view real-time view real-time view
speed layer query query
1
2 3 4 5
d a t a
1
data is dispatched to batch and speed layer for processing.
2
precomputes the batch views
3
indexes the batch views
4
deals with recent data only. Any incoming query can be answered by merging results from batch views and real-time views.
5
λ
lambda.grnet.gr 5
Lambda on demand service λ instances λ layers λ api λ λ λ
Speed Batch Speed Batch Speed
Based on
λ
lambda.grnet.gr 6
Dashboard, Instances, Applications and help
Create your lambda instances based on your needs. Manage , deploy applications and start your lambda instance.
λ - Instances
manage lambda instances
Upload your Java or Scala application for streaming and batch jobs. Your applications are stored on the Pithos+ storage service.
Applications
manage your applications
Short guides on how to 1) deploy, run and manage your lambda instances. 11) deploy, run and manage your applications 111) export and view your results
Help
Informational guides
λ ?
app
λ
lambda.grnet.gr 7
Use the λambda API
lambda instance lambda applications
λ - API
create upload manage delete manage delete
well documented with
Swagger mkdocs doc
λ
lambda.grnet.gr 8
Use the λambda API
Lamda λ: focuses on analysing steaming Data e-Science: focuses on existing data + offers a pre-installed collaborative tools to handle data
λ
lambda.grnet.gr 9