examples of saas on cloud dynamically scaling r galaxy
play

Examples of SAAS on cloud: dynamically scaling R, Galaxy and Matlab - PowerPoint PPT Presentation

Examples of SAAS on cloud: dynamically scaling R, Galaxy and Matlab tom.visser@sara.nl linkedin.com/in/visservisser Our vision Cloud computing is not about new technology, it is about new uses of technology Self Service Dynamically Scalable


  1. Examples of SAAS on cloud: dynamically scaling R, Galaxy and Matlab tom.visser@sara.nl linkedin.com/in/visservisser

  2. Our vision Cloud computing is not about new technology, it is about new uses of technology Self Service Dynamically Scalable Computing Facilities

  3. Our experience in a nutshell * People bring their existing problems and ideas They are creatively stimulated A world of new possibilities. A proof of concept environment Very popular offer * see Munich march 2012 presentation https://indico.egi. eu/indico/contributionDisplay.py?contribId=39&confId=679

  4. Self service, how do you support that? We are an e-science group, so why stop at Software as a service self service... Hands-free to experiment with user Platform as a service communities Infrastructure as a self-service

  5. Preconfigured VM wizard

  6. Cases for today ● Galaxy workflow server (slides thanks to Leon Mei & Matthias den Hollander at NBIC / NIOO) ● Using R the statistical software ● Matlab (MDCS and parralel computing toolkit).

  7. NBIC Galaxy server NBIC History Tools Control panel panel

  8. Strong User Community Galaxy is widely used for analyzing Next Generation Sequencing data ● PennState University, BSD like license ● Very active user community (about 200 participants to Galaxy Community Conference in 2012, and 150 in 2011) galaxy.nbic.nl ● Started in 2010 ● >240 registered users ● Used in a number of courses, trainings

  9. NBIC Galaxy @HPC Cloud ● Project started in July 2012 ● Supported by BiG Grid, SARA, NBIC, NIOO ● Planned launch in September 2012 ● Aim to reach 500 users by the end of 2012 ● Will be used as the base for other project-specific Galaxy server deployments in the HPC cloud

  10. Architecture Dynamic Internet SGE auto- Worker (Dynamic scaling Node DNS) Persistent Master NFS mounts Node Shared storage (Tools, Genomes) Backup

  11. Tool Installation Automation ● http://usecloudman.org ● Developed by the PennState Galaxy team ● MIT license ● Support Amazon EC2 and OpenStack ● Fabric installation scripts ● Galaxy itself ● Postgres ● Sun Grid Engine ● Common NGS tools, e.g. BWA, bowtie, samtools, etc.

  12. Data Installation Automation ● http://cloudbiolinux.org/ ● Developed by a team consists members from Harvard Univ., J. Craig Venter Institute, the Galaxy team ● MIT license ● Fabric installation script ● Common genome builds, hg18, hg19, mm9, tair10, etc. ● Tool specific genome indexes for bowtie, BWA, etc.tc.

  13. Using R ● Using R for transcriptomics Statistical package -> http://www.r-project.org / ○ Existing cluster installation ○ Ported to cloud via allways-on headnode spawning R ○ workers, ref: Han Rauwerda and Timo Breit - http://www.ebiogrid.nl/generic-infrastructure.html - http://www.biggrid.nl/hpc-cloud-day-4-october-2011/ ● Using R (2) for economics analysis R studio project - r-studio.org ○ Interactive R session via web-browser on big virtual ○ machine; work done by Lykle Voort at SARA.nl see also: https://www.cloud.sara.nl/projects/ceff

  14. Matlab distributed computing service ● Preliminary investigations with Mathworks company using their MDCS solution ● Scenario; user has valid client license for parallel computing toolbox and spawns x workers on cloud cluster of service provider (BiG Grid) licensing issue is fixed this way ?? ○ user can work in existing environment ○ enormous potential because of broad user-base for ○ matlab ● multi tenancy / accounting and dynamic scaling still have to be solved

  15. Some observations ● Scientist have there own preferred tools and ways of working ● There's a lot of hidden programmers / technically skilled and or ambitious people out there ● Labs and institutes have there own clusters and computing and solutions ● Local ICT departments are less facilitating in experimentation -> limited capacity You can seduce and enable the scientific community by offering this type of infrastructure and striving for proper integration

  16. Outlook / thoughts ● Coping with autoscaling mechanisms ● Data-locality -> does cloud solution actually fit the problem? ● Integration, automation. ● Infrastructure should be there; without friction! IAAS! ● Offering cluster solutions to the masses but there's no such thing as an infinite resource.. ● Computing resources cost money, can we share the investments?

  17. References Galaxy - http://galaxy.nbic.nl - https://www.cloud.sara. nl/projects/mattiasdehollander-project/wiki R cluster setup - http://www.biggrid. nl/fileadmin/documents/Han_Rauwerda_HPC_Cloud_Day_20111004.pdf R-studio http://rstudio.org/ BiG Grid project http://www.biggrid.nl SARA http://www.sara.nl Fed cloud task force https://wiki.egi.eu/wiki/Fedcloud-tf:FederatedCloudsTaskForce EGI Tech Forum 2012 BiG Grid presentation https://indico.egi.eu/indico/contributionDisplay.py? sessionId=12&contribId=50&confId=1019 EGI Community Forum presentations on BiG Grid cloud https://indico.egi.eu/indico/contributionDisplay.py?contribId=39&confId=679 Picture courtesy http://www.flickr.com/photos/wongjunhao/424826584/ , http://www.flickr.com/photos/rummenigge/2225696954/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend