From construction to deployment of LifeWatchGreece: The potential - - PowerPoint PPT Presentation

from construction to deployment of lifewatchgreece the
SMART_READER_LITE
LIVE PREVIEW

From construction to deployment of LifeWatchGreece: The potential - - PowerPoint PPT Presentation

From construction to deployment of LifeWatchGreece: The potential role of EGI - LW Competence Centre by Emmanouela Panteri Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis


slide-1
SLIDE 1

1 4/23/2015

by Emmanouela Panteri

Contributors: Christos Arvanitidis, Nicolas Bailly, Sarah Faulwetter,, Jacques Lagnel, George Perantinos, Anastasis Oulas, Panagiotis Vavilis, Kleoniki Keklikoglou, Matina Nikolopoulou, Alexandros Gougousis and about 30 data managers

From construction to deployment of LifeWatchGreece: The potential role of EGI - LW Competence Centre

slide-2
SLIDE 2

2 4/23/2015

LifeWatchGreece e-Infrastructure

Insert footer here LifeWatchGreece

LWG e-infrastructure:

  • Multi-server e-infrastructure

currently deployed in HCMR, Crete

  • Hosts biodiversity data and

applications Applications:

  • e-services: searching datasets/ data
  • r one-shot analyses
  • vLabs: interfaces for advanced

selection of datasets/data, and more elaborated suites of analyses

series of web tools (vLabs or e-services) for the public

slide-3
SLIDE 3

3 4/23/2015

Application development in 2 steps:

  • Independent development of a web

application (by the team)

  • Integration to the infrastructure / portal

Access Control

  • Landing page: list of applications
  • One-time sign-up for accessing all apps
  • A few applications require more

credentials: the computer-intensive

  • nes
  • User Rights management

Graphical Interface

  • A common graphical interface

frame/wrapper introducing all applications

Accessed by the LifeWatchGreece Portal

LifeWatchGreece

portal.lifewatchgreece.eu

slide-4
SLIDE 4

4 4/23/2015

LWG e-Infrastructure: advantages

LifeWatchGreece

  • Applications developed in any programming language (PHP,

Java EE, .NET, ...)

  • Design, development and maintenance of applications

independent from each other: a common standard only for data exchange (DwC, …)

  • Each application run in independent execution environment:

scalable VMs number if needed with more apps.

  • Compartmented security: affected application does not

compromise others

  • Core developers involved only at integration stage

LifeWatchGreece

slide-5
SLIDE 5

5 4/23/2015

LWG e-Infrastructure: advantages

LifeWatchGreece

  • Other integration methods: iframes, integrating graphically

commercial apps

  • Open source applications integration possible with few

adaptations both at access level and graphical level, especially when under MVC architecture

  • Moreover, most CMSes can be easily integrated, at least at

the access control level

  • Certain javascript and CSS frameworks provided by default

through libraries in order to enforce the consistency of the user interface throughout the portal

LifeWatchGreece

slide-6
SLIDE 6

6 4/23/2015

LWG Portal diagram

LifeWatchGreece

LWG: Application Layer, Data Layer, Cluster, Communication

slide-7
SLIDE 7

7 4/23/2015

LWG e-Infrastructure: What is missing

LifeWatchGreece

  • No user workspace
  • Currently, files not retrievable from one session to the other, from
  • ne tool to the other.
  • Could EGI Competence Center provide such functionality?

Workspace development will increase significantly the storage requirements.

  • Would require some work between LWG infrastructure and EGI-CC

(e.g., space allocation after sign up)

LifeWatchGreece

slide-8
SLIDE 8

8 4/23/2015

Mainly focused on OMICs NGS data analysis:

  • Transcriptomics (RNA-Seq)
  • Genomics (Eukaryote and bacterial)
  • Metagenomics (microbial community)
  • Metabarcoding
  • RAD-Sequencing

More than 170 bioinformatics packages covering:

  • Genomes & transcriptomes de novo assembly
  • Functional and structural genes annotation
  • Sequence similarity (parallel BLAST) and mapping
  • Population genetics
  • Phylogeny reconstruction
  • Statistics (250 R packages installed)
  • Genetic markers mining/analysis

HPC bioinformatics platform

LifeWatchGreece

43 users from 11 institutes in 5 countries (Greece, Italy, France, Norway, Portugal) More than 8000 jobs submitted during the last month

slide-9
SLIDE 9

9 4/23/2015

  • 9 worker nodes
  • 108 cores,
  • 784 GB RAM,
  • 30TB storage
  • 10 Gbps ethernet network
  • Gentoo Linux
  • Resource Manager: Torque/Maui
  • storage: XFS/NFS
  • storage users quota

HPC bioinformatics platform upgrade

LifeWatchGreece

  • 13 worker nodes
  • 300 CPU cores
  • 2.5 TB RAM
  • 120 TB storage
  • 40 Gbps Infiniband network
  • Centos linux/debian
  • Resource Manager: SLURM
  • Storage: Lustre and ZFS/NFS
  • Storage group/users quota
  • LXC Virtualization
  • User management via LDAP

~3x Performance

Software (open source) Hardware

Languages: GCC, ICC/IFC, R, BioPerl Biopython, ruby, Biojava.... parallelization: openMPI, OpenMP and pthreads Database servers: MySQL, PostgreSQL, ...

slide-10
SLIDE 10

10 4/23/2015

Bioinformatic challenges

LifeWatchGreece

RNA-Seq data analysis =>360 Mreads Optimised and parallelised pipeline Sequence similarity search: parallel BLAST =>10,000 queries Runtime on the biocluster (h) Runtime on a PC (1 CPU) Assembly requires~35 0GB shared RAM 12 (10 threads) >120 h Annotation BLAST 96 (94 jobs) >> 3 month InterPro 32 (48 jobs) 1.5 month Mapping 4 (12*10 threads) >10 days Total ~ 6 days >5 months

Nb CPUs 1 94 blastn (nt) Speedup / Runtime (h) 1.0 / 6.1 days 105.4 / 1.4 h blastx (nr) Speedup / Runtime (h) 1.0 / 11.6 days 88.8 / 3.1 h

slide-11
SLIDE 11

11 4/23/2015

eServices and vLabs: the R-vLab

LifeWatchGreece

How can EGI Competence Center help LWG e-infrastructure to increase its computational power?

  • Uses the “R” programming

language

  • Supports an integrated and
  • ptimized online R environment

(data manipulation and computational speed-up)

  • Allows to overcome severe

computational power deficit, e. g.: Calculation on large matrices of several biodiversity indices and of multivariate analyses

slide-12
SLIDE 12

12 4/23/2015

eServices and vLabs: the R-vLab

LifeWatchGreece

~20 fold speed-up Conventional Mantel compared to Parallel Mantel

slide-13
SLIDE 13

13 4/23/2015

  • Micro-computed tomography
  • Non-destructive method of 3D x-ray

microscopy

  • Creation of 3D models of objects

from a series of x-ray projection images MicroCT offers:

  • Collection of virtual galleries
  • f taxa displayed and disseminated
  • Manipulation of the 3D models through

a series of online tools

  • Download of datasets for local

manipulations

eServices and vLabs: MicroCT

LifeWatchGreece

How can EGI Competence Center help LWG e-infrastructure for the storage and image manipulation, incl. 3D models?

slide-14
SLIDE 14

14 4/23/2015

MicroCT: current issues

In general:

  • Potential large increase of the number image galleries especially

from museum specimen collections (several orders of magnitude)

  • Need for 3D metadata standards: dissemination and searching
  • Need for 3D data annotations protocols and tools
  • Need for searching tools over the spread catalogues of galleries

(centralized or distributed) In LWG

  • MicroCT generates many image files: storage issue
  • Processing and manipulating images are CPU intensive: computing

issue

LifeWatchGreece

slide-15
SLIDE 15

15 4/23/2015

Harvesting various other repositories such as:

  • Taxonomic: CoL and PESI (and components: FADA, EMRS,

E+MP), WoRMS, EEA/EUNIS, ...

  • Occurrences: GBIF, OBIS, ...
  • Species traits: PolyTraits, FishBase, SeaLifeBase, eModNet, ...
  • Bibliography: RefBank, BHL, AnimalBase, ...
  • Citizen Science: iNaturalist, ...
  • Workflows: BioVel, ...

Install mirror websites: FishBase, RefBank, GNI Develop Web Services for disseminating LWG data:

  • Concerns about performance due to Web services use

LifeWatchGreece

LWG and EGI Competence Center

Processing power and storage requirements

slide-16
SLIDE 16

16 4/23/2015

Linked Data / Linking Open Data

LifeWatchGreece LifeWatchGreece

LifeWatchGreece principle: make data available to everybody A number of datasets as RDF under triplestores are ready

Diagram from http://lod-cloud.net/

slide-17
SLIDE 17

17 4/23/2015

LifeWatchGreece

LifeWatchGreece Research Infrastructure , funded by the GSRT (Greek government: structural funds), is the national effort to address the above requirement and to support relevant studies. To materialize its aim, LWG RI adheres to the central lifewatch.eu guidelines, and attempts to ally all the Greek scientific human resources working on biodiversity data and data

  • bservatories.

Coordinated by the Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC, www. imbbc.hcmr.gr) of the Hellenic Center for Marine Research (HCMR, www.hcmr.gr), LWG includes 49 partner institutions covering a wide range of scientific disciplines (terrestrial, marine and freshwater biology, zoology, botany, geography, forestry, agriculture, genetics, biotechnology, pharmacy, aquaculture, education and law).

LifeWatchGreece

Thank you ;)