Providing Bioinformatics Services on Cloud Christophe Blanchet, - - PowerPoint PPT Presentation

providing bioinformatics services on cloud
SMART_READER_LITE
LIVE PREVIEW

Providing Bioinformatics Services on Cloud Christophe Blanchet, - - PowerPoint PPT Presentation

Providing Bioinformatics Services on Cloud Christophe Blanchet, Clment Gauthey C. Blanchet and C. Gauthey Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE EGI CF13, Manchester, 9 April 2013 http://idee-b.ibcp.fr


slide-1
SLIDE 1

Christophe Blanchet, Clément Gauthey

Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE http://idee-b.ibcp.fr

IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)

Providing Bioinformatics Services

  • n Cloud
  • C. Blanchet and C. Gauthey

EGI CF13, Manchester, 9 April 2013

Infrastructure Distributed for Biology - IDB CNRS-IBCP FR3302, Lyon, FRANCE

slide-2
SLIDE 2

EGI CF13, Manchester, 9 April 2013

Bioinformatics Today

  • Biological data are big data
  • 1512 online databases (NAR Database Issue 2013)
  • Institut Sanger, UK, 5 PB
  • Beijing Genome Institute, China, 4 sites, 10 PB

➡ Big data in lot of places

  • Analysing such data became difficult
  • Scale-up of the analyses : gene/protein to complete genome/

proteome, ...

  • Lot of different daily-used tools
  • That need to be combined in workflows
  • Usual interfaces: portals, Web services, federation,...

➡ Datacenters with ease of access/use

  • Distributed resources
  • Experimental platforms: NGS, imaging, ...
  • Bioinformatics platforms

➡ Federation of datacenters

ADN ADN BI M ADN ADN BI ADN BI CC BI ADN ADN

slide-3
SLIDE 3

EGI CF13, Manchester, 9 April 2013

Sequencing Genomes

source: www.politigenomics.com/next-generation-sequencing-informatics

Complete genome sequencing become a lab commodity with NGS (cheap and efficient)

source: www.genomesonline.org

slide-4
SLIDE 4

EGI CF13, Manchester, 9 April 2013

Infrastructures in Biology

Lot of tools and web services to treat and vizualize lot of data

slide-5
SLIDE 5

EGI CF13, Manchester, 9 April 2013

The scene

  • Bioinformatics services providers
  • Is it easy to deploy lot of (incompatible) tools ?
  • To make them connected to public databases ?
  • To limit transfer of huge data ?
  • To provide users with their own computing resources ?
  • With their own isolated storage ?
  • Scientists
  • Is it easy to access/use these tools ?
  • To adapt to your usage ?
  • To get your/other tools deployed on a datacenter ?
  • To combine them ?
  • To get my own computing/storage resources ?
slide-6
SLIDE 6

EGI CF13, Manchester, 9 April 2013

IDB’s Cloud

  • Cloud workbench for Biology
  • 13 turnkey bioinformatics appliances (as of Apr. 2013)
  • Running since Sept. 2011, opened to Biology community
  • Lyon, FRANCE
  • Powered by
  • StratusLab
  • Compute nodes, Block storage
  • +900 cores, +4TB RAM, 36TB vdisks
  • Mainly Intel SandyBridge servers with 32c 128GB
  • Bigmen servers with 64c 768GB
  • VMs from 1 to 64c, 512MB to 760GB RAM
  • + Openstack
  • Object storage (Swift)
  • +200 TB redundant & scalable storage
slide-7
SLIDE 7

EGI CF13, Manchester, 9 April 2013

Driven throught a simple web interface

slide-8
SLIDE 8

EGI CF13, Manchester, 9 April 2013

Integrate Bioinformatics Tools in Cloud

BLAST GOR4 FastA SSearch Abyss ClustalW

Bioinformatics Tools

Ray BWA PhyML

RedHat, CentOS Debian, Ubuntu Suse

Linux Virtual machines

Create new Appliance

Bioinformatics Marketplace

NGS Structure Galaxy ARIA (…)

Sequence

  • Appliances are virtual machines
  • small : few GB, easy to convert in most virtualization formats
  • Installed and pre-configured with common bioinformatics tools
  • e.g. BLAST, Clustalw, ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
slide-9
SLIDE 9

EGI CF13, Manchester, 9 April 2013

Bioinformatics Appliances

slide-10
SLIDE 10

EGI CF13, Manchester, 9 April 2013

Select your bioinformatics tools

slide-11
SLIDE 11

EGI CF13, Manchester, 9 April 2013

Run Bioinformatics Cloud Instances

Bioinformatics Marketplace

NGS Structure Galaxy ARIA (…)

Sequence

IBCP's Cloud Resources

BLAST, Clustal, etc.

PaaS

Workers VM CNS

S h a r e d F S

launch jobs ssh

I a a S

Master & Storage VM ARIA

Portal

Launch Instances

slide-12
SLIDE 12

EGI CF13, Manchester, 9 April 2013

Manage your Cloud Instances

slide-13
SLIDE 13

EGI CF13, Manchester, 9 April 2013

UNIPROT PDB EMBL PROSITE Genomes

Public Data sources

Bioinformatics Cloud

BLAST, Clustal, etc.

PaaS

Workers VM CNS

Shared FS

launch jobs ssh

IaaS

Master & Storage VM ARIA

Portal

shared (NFS)

U s e r P e r s i s t e n t d a t a

pdisk (iSCSI)

Biological Data in Cloud

Upload your data Get your results

scp http/S3 scp http/S3

slide-14
SLIDE 14

EGI CF13, Manchester, 9 April 2013

Example: ‘biocompute’ Appliance

  • Use your own instance(s)
  • With pre-installed

standard bioinformatics tools

  • BLAST, FastA, SSearch,HMM,...
  • ClustalW2, Clustal-Omega, Muscle,..
  • Bowtie(2), BWA, samtools, ...
  • MEME, R, etc.
  • Connected to public

reference data

  • Uniprot, EMBL, genomes, PDB, etc.
  • Automaticaly shared to the

VMs

slide-15
SLIDE 15

EGI CF13, Manchester, 9 April 2013

Example: Galaxy portal for NGS analyses

  • Analyse NGS data
  • portal Galaxy is widely used in the community
  • connected to large public data: sequences and indexes
  • large user data (GBs)
  • Preserve workflows and results (persistent storage)
slide-16
SLIDE 16

EGI CF13, Manchester, 9 April 2013

Example: Proteomics

  • Motivation
  • Collaboration with a mass spectroscopy platform
  • Running out of space on their local resources
  • Protein identification
  • Mass experimental data
  • Reference databases : nr, Swiss-Prot
  • Reference screening tools:

OMSSA, X!Tandem

  • User interface
  • Remote display
  • NX
  • Reference GUIs
  • SearchGUI
  • PeptidShaker

source: PeptideShaker site

slide-17
SLIDE 17

EGI CF13, Manchester, 9 April 2013

Conclusion

  • Provide turnkey bioinformatics appliances
  • Standard tools and pipelines
  • Interoperability: ready to run on cloud
  • Easier to transfer appliances than data (GB vs TB)
  • Provide a cloud infrastructure tightly connected

to existing bioinformatics infrastructure

  • Public IDB’s bioinformatics cloud
  • Linked to public biological databases
  • In collaboration with the French Bioinformatics Institute
  • Ease the usage by scientists
  • Usual bioinformatics gateways
  • Persistent and large ubiquitous storage
  • Web interface for cloud management
slide-18
SLIDE 18

EGI CF13, Manchester, 9 April 2013

Perspectives

  • Define good practices to provide academic

community and industry with bioinformatics services!

  • French Bioinformatics Institute - IFB
  • Goals are to provide core bioinformatics resources to the

national and international life science research community in key fields such as genomics, proteomics, systems biology, etc.

  • Aims at building a national academic cloud devoted to

Bioinformatics, inspired by the model evaluated through the IDB’s cloud.

  • European ELIXIR infrastructure
  • To build a sustainable European infrastructure for biological

information, supporting life science research and its translation

  • IFB will be the French representative in ELIXIR.
slide-19
SLIDE 19

EGI CF13, Manchester, 9 April 2013

  • Acknowledgment
  • StratusLab members
  • co-funding by the European Community's Seventh

Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001).

Questions ?

http://idee-b.ibcp.fr