irods in the cloud scidas and nih helium commons
play

iRODS in the Cloud: SciDAS and NIH Helium Commons Commons Claris - PowerPoint PPT Presentation

iRODS in the Cloud: SciDAS and NIH Helium Commons Commons Claris Castillo RENCI, UNC Chapel Hill Not Scaling up Data Analysis is Not an Option 20 th Century 21st Century Normal veteran (giga-/terascale) and newbie


  1. iRODS in the Cloud: SciDAS and NIH Helium Commons ������ Commons Claris Castillo RENCI, UNC Chapel Hill

  2. Not Scaling up Data Analysis is Not an Option 20 th Century 21st Century Normal veteran (giga-/terascale) and newbie (megascale) users MUST ADVANCE to the peta/exa-scale in this generation. Issues: Limited computational skills (What is a C library?) • Poor use of advanced networks (We need more HDs to mail!) • Limited access to computational resources (awareness, $$$) • Unpredictable time to compute result (queue times, queue times, • queue times, broken nodes, segfaults, OOM, data geography) DatAPocaLypse Prediction (Genomics): Missing skillsets (I only know Perl) • Data must be organized and good stuff deleted (Data policies) • In 20 years, every CVS, subway, hospital, research lab, public health facility, police station, etc will have a DNA sequencer generating Exabytes of data in aggregate each week. • How many bioinformaticists are on the CVS payroll? • How many faculty recruitments failed because campus X research computing resources are stuck in 2015? • How many adverse drug reactions were not predicted because of limited/broken cyberinfrastructure? Alex Feltus Wisegeek.org www.smartpractice.com

  3. Heterogeneous and Complex CI Ecosystems Community data … sharing platforms +1500 users +100 sites Compute infrastructure Advance networks Storage infrastructure

  4. Commoditization of Cloud computing and the convergence of compute, storage, data and network technologies enables the ‘illusion’ of a single large computer consisting of widely distributed systems.

  5. Breakdown: One Layer at A Time -- Data … SciDAS Zone +1500 users +100 sites MariaDB Gallera cluster iRODS team connected iRODS to a MariaDB Galera Cluster to provide a multi-master, distributed iRODS catalog over the WAN. “Distributing the iRODS Catalog: a way forward”, M. Stealey, et. al. iRODS User Group Meeting (UGM), Netherlands, 2017.

  6. Breakdown: One Layer at A Time -- Compute … +1500 users +100 sites Apache Mesos: A layer of abstraction, to utilize an entire data center as a single large server

  7. Breakdown: One Layer at A Time – Scientific Tools … +1500 users +100 sites Scientific applications will be available in the form of SciApps “virtual appliances” (NSF CC-ADAMANT, [works15 ]) [works15] Enabling Workflow Repeatability with Virtualization Support , Fan Jiang et.al. Workshop on Workflows of Large-Scale Science, Supercomputing Conference (SC15), Austin, Texas,2015.

  8. SciDAS: Bringing it All Together Into One System Cost-Aware Optimize Requester … PerfSONAR +1500 users iRODS +100 sites PerfS Orchestrator Shim (aaS) Shim (aaS) ONA API API R map ping SciDAS Middleware Network aware placement • Optimize for data locality • Capability aware resource aware • placement GPU able nodes • Authentication and authorization • infrastructure CiLogon • [works15] Enabling Workflow Repeatability with Virtualization Support , Fan Jiang et.al. Workshop on Workflows of Large-Scale Science, Supercomputing Conference (SC15), Austin, Texas,2015.

  9. Improving scientific productivity by the numbers

  10. ������ Commons Data / Tools Data / Tools Discovery Enrollment Data Commons Scalable, Secure and Collaborative Workflow Execution Data Commons APIs Scientific Interoperability Component APIs Security & Compliance Communities Workspace FAIR Search & Indexing Global Unique ID Cloud-Agnostic Platform

  11. ������ Commons data: /aws/TopMed High-level descriptor of cloud-preference: GC {:} {:} {:} {:} Encryption: true applications JSON Docker-imge:foo Appliances JSON JSON JSON Ram:16G CPU:Stge: 5TB Virtualization system Metadata to encode rich Rule engine programmed Data Federation J SON Descriptors Jupyter apps CWL apps CommonsShare (KC5:portal) Input information with rules to enact policies Intelligent decision PIVOT API/Core Service (cloud aware) Make results discoverable Chronos Provision & Marathon deploy Access/write data anywhere ` ` TopMED TopMED MOD MOD GTEX GTEX … Bring-Your-Own- … Bring-Your-Own-Data Data-Service

  12. ������ iRODS enables powerful data sharing Commons data: /aws/TopMed High-level descriptor of cloud-preference: GC {:} {:} {:} models in the Commons {:} Appliances Encryption: true applications JSON Docker-imge:foo J SON Descriptors JSON JSON JSON Jupyter apps Ram:16G CPU:Stge: 5TB CWL apps CommonsShare (KC5:portal) Input Data Federation (default): Extended data collaboration (BYODS): continuous virtual system Intelligent decision PIVOT API/Core Service (cloud aware) Seamless integration with data while retaining control of BYOD: Cloud storage can hosted on external data services each endpoint Make results be added as storage discoverable Chronos resources Provision & Marathon deploy Access/write data anywhere ` ` TopMED TopMED MOD MOD GTEX GTEX … Bring-Your-Own- … Bring-Your-Own-Data Data-Service

  13. Thank you! claris@renci.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend