HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda - PowerPoint PPT Presentation

HDF Group ESRF September 2019 Kita-OIO SDS Integration

Agenda September 2019 1 About OpenIO 2 OpenIO SDS + KITA 3 Demo 2 OpenIO

OpenIO ID 3

Founded in 2015 Quickly growing across geographies and vertical markets 3 40+ 3 40 Continents Investors Large customers Employees Deployments from 3 HQ Mostly R&D, Support, nodes up to 40 Lille tech people Petabytes and tens of Growing fast billions of objects Offices Paris, Tokyo Teams across EMEA & Japan OpenIO selected as Cloud start-up to follow closely, Recognition & awards: March 2019 4 OpenIO

OpenIO Vision and Mission Vision: We envision a data-centric world where OpenIO is recognized as the universal storage solution for unstructured data Mission: OpenIO’s mission is to deliver an open source, high performance object storage solution that meets the demanding needs of customers working with HPC, Big Data and AI OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

OpenIO SDS and HDF Kita Jean-François Smigielski OpenIO- CTO 6

Storage Landscape in HPC Network Attached Storage OIO-Object Storage Storage POSIX / File Cost effective-Scalable Area Medium-Latency Very High throughput Network Average Scalability Block Storage- Low Latency High Throughput Tape Hot, Mutable Cold, Warm, Parallel FS Offline copies Immutable High $/GB MPI-IO capable High Latencies Low $/GB High concurrency Very-Low Latency High Latency Very-High throughput Low Latency OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

Why object storage to fill the gap? TCO! 1/ What is Object Storage 3/ How can it integrate? • • Unstructured Immutable Data + Metadata Hierarchically, behind a primary fast tier • • High Parallelism Independant tiers with data movements orchestrated • 100% Online Dataset • Cloud-oriented, ideal for large scale • De facto standards: S3 (AWS), Swift (Openstack) 2/ Meanwhile, in HPC... 4/ KITA , the necessary middleware • • S3/Swift are not standards, HDF5 & MPI-IO are Persist mutable datasets as immutable objects! • • HDF5 was not designed to work with objects Independant Object Storage, with HDF5 as an orchestrator (import / export) • Huge mutable datasets • Lower TCO would be appreciated 8 OpenIO

Why OpenIO? We Think Different! ConsciousGrid TM Directory with Grid of nodes Open Source & HW technology indirections agnostic Track containers and not Avoid vendor lock-in and keep Never rebalance: Real-time load balancing control of your data objects: for optimal data Scale up and out in small or placement, more efficient large increments and on any Open source guarantees the than a ring-based hardware that you choose, continuity of the solution and architecture while maintaining consistent gives your engineers the high performance opportunity to understand how the tech works Being hardware agnostic allows better capacity planning and improve your TCO 9 OpenIO

KITA/OIO Integration Architecture 1 0

Kita's Architecture Data Access Options C/Fortran Web HDF Kita Client SDKs for Python and C are drop-in Applications Applications replacements for libraries used with local files. Community Browser Conventions Clients do not No significant code change to access local and REST Virtual know the details HDF5 Lib cloud based data. Object Layer of the data or the storage system S3 Virtual File Driver REST h5pyd h5py API Python Command Applications Line Tools OpenIO

Kita / OIO Similar Architectures Service Node Data Node Persistence Service Node Data Node KITA Client Load Balancer Backend Service Node Data Node Service Node Data Node What if we just juxtapose both? ConsciousGrid Service Metadata Proxy Directory Service S3 Gateway Rawx Service S3 Gateway Rawx Service OIO-SDS Client Load Balancer S3 Gateway Rawx Service S3 Gateway Rawx Service OpenIO

Step 1: Easy Integration Let's configure a single load-balanced endpoint for the persistence backend • Many redundant scalability patterns (caching, sharding, load-balancing) • Huge bandwidth usage: any stream is repeated • 2 autonomous clusters with deployment patterns • Stateless, with K8s or Docker Swarm for Kita • Stateful, with Ansible on bare-metal for OpenIO Barely acceptable for functional validation purposes, as a first step. 13 OpenIO

Step 2: Tighter Integration Load Balancer Conscience Service Colocated, local network Registered in the Redirection Metadata Proxy Directory Service ConsciousGrid as a LB S3 Gateway Service Node Data Node Rawx Service Service Node Data Node S3 Gateway Rawx Service Client S3 Gateway Service Node Data Node Rawx Service Deployment S3 Gateway Rawx Service Service Node Data Node Unit OpenIO

Kita / OIO: Benefits of a Tight Integration • Lower HW cost & cost/giga • Tightly coupled • Lower management cost one architecture cluster to manage rather than two! • Effort rts on • Improved management cycle pra ragmatism andparcimony Optimised Optimised performance TCO OpenSource Scalability solution • Scaling up increase • Both HDF and simultaneous clients OIO are born and parallelism for opensource HDF access • No vendor lock-in • De facto standards OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

Kita / OIO: A (not so) Imaginary Use Case • A scientist visits a research facility and he/she • A scientist visits a research facility and he/she starts a new experiment on an existing run starts a new experiment on an existing run • The test happens, data are dumped on a fast • The test happens, data are dumped on a fast buffer storage (Parallel FS) buffer storage (Parallel FS) • Soon after, copies are made on the • Soon after, copies are made on the secondary secondary storage (Tape) storage (Private Cloud) and data is flushed from the buffer • Preliminary validations are performed on the data in the fast buffer • Preliminary validations are performed from the cloud • His/Her long stay comes to an end, he/she returns with a pile of BluRay/DVD • His/Her short stay comes to an end, he/she returns with credentials to the cloud. • The data is flushed from the buffer. Much smaller and cheaper Buffer needed! Better user experience! OpenIO

Demo 1 7

Want to learn more? Contacts – jean-francois.smigielski@openio.io – marielaure.retureau@openio.io – dax.rodriguez@hdfgroup.org Try out Kita for free in our JupyterLab environment: – See: http://www.hdfgroup.org/hdfkitalab/ Learn more about HDF Kita: https://www.hdfgroup.org/solutions/hdf-kita/ Learn more about OpenIO SDS: https://www.openio.io/product/product-overview OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda - PowerPoint PPT Presentation

HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda September 2019 1 About OpenIO 2 OpenIO SDS + KITA 3 Demo 2 OpenIO OpenIO ID 3 Founded in 2015 Quickly growing across geographies and vertical markets 3 40+ 3 40

2. HDF AAI Meeting -- Demo Slides Steinbuch Centre for Computing Marcus Hardt KIT University

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Use of a New I/O Stack for Extreme-scale Systems in

CDAS Design Drivers Access data in raw (NetCDF, HDF) format in POSIX filesystem. Avoid

HDF 412 CAPSTONE Emily Manis, Saverio Varca, Abigail Jaffa, Annemarie Shaw, Raven Sannon,

HDF 190: FIRST YEAR LEADERS INSPIRED TO EXCELLENCE LEADERSHIP PORTFOLIO Conor Fagan SPRING 2011

Mercury: Enabling Remote Procedure Call for High-Performance Computing J. Soumagne, D. Kimpe , J.

Mercury: Enabling Remote Procedure Call for High-Performance Computing J. Soumagne, D. Kimpe, J.

PORTFOLIO www.adept-group.biz 28 www.adept-group.biz 29 www.adept-group.biz 30

Galaxy Formation and Evolution: Hubbles Legacy ! achel somervi " e # $ utgers Universi %#

HUMAN SCIENCES (HHS) School of HHS Departments Communication Sciences and Disorders (CSD)

flooring having a homogenous construction with an uniform density of approximately 900 kg/m 3

MODUS Light ACOUSTIC PANELS MODUS Light Sound absorbing wall and ceiling panels with different

Systems H ADOOP Distributed File System Dr. Taieb Znati Computer Science Department University

About FSP Group About FSP Group FSP Group Structure FSP Group Structure FSP Group FSP

About FSP Group About FSP Group www.fortronsource.com FSP Group Structure FSP Group

SUMMARY History End of life CLI Services Security Considerations PowerShell

Provider Directory Advisory Group Meeting January 13, 2016 Welcome! Introductions Agenda

Next-Generation gTLD Registration Directory Service (RDS) to replace WHOIS ICANN57 F2F Meeting

=C=Fermilab Managed by Fermi Research Alliance, LLC for the U.S. Department of Energy Office of

ADHA S ecure Messaging Industry Offer Proposed T esting Process Details December 2019

Chapter 9: Name Services 9.1 Introduction 9.2 Name services and the DNS 9.3 Directory

DESIGN OF SCALABLE DIRECTORY SERVICE FOR FUTURE IoT APPLICATIONS Ved P. Kafle, Yusuke Fukushima,

8/24/2016 Transition from First Steps to Early Childhood Special Education ( ECSE) August 2 0

Sambuz

Useful Links

Newsletter

Mail Us

HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda - PowerPoint PPT Presentation

HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda September 2019 1 About OpenIO 2 OpenIO SDS + KITA 3 Demo 2 OpenIO OpenIO ID 3 Founded in 2015 Quickly growing across geographies and vertical markets 3 40+ 3 40

2. HDF AAI Meeting -- Demo Slides Steinbuch Centre for Computing Marcus Hardt KIT University

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Use of a New I/O Stack for Extreme-scale Systems in

CDAS Design Drivers Access data in raw (NetCDF, HDF) format in POSIX filesystem. Avoid

HDF 412 CAPSTONE Emily Manis, Saverio Varca, Abigail Jaffa, Annemarie Shaw, Raven Sannon,

HDF 190: FIRST YEAR LEADERS INSPIRED TO EXCELLENCE LEADERSHIP PORTFOLIO Conor Fagan SPRING 2011

Mercury: Enabling Remote Procedure Call for High-Performance Computing J. Soumagne, D. Kimpe , J.

Mercury: Enabling Remote Procedure Call for High-Performance Computing J. Soumagne, D. Kimpe, J.

PORTFOLIO www.adept-group.biz 28 www.adept-group.biz 29 www.adept-group.biz 30

Galaxy Formation and Evolution: Hubbles Legacy ! achel somervi &quot; e # $ utgers Universi %#

HUMAN SCIENCES (HHS) School of HHS Departments Communication Sciences and Disorders (CSD)

flooring having a homogenous construction with an uniform density of approximately 900 kg/m 3

MODUS Light ACOUSTIC PANELS MODUS Light Sound absorbing wall and ceiling panels with different

Systems H ADOOP Distributed File System Dr. Taieb Znati Computer Science Department University

About FSP Group About FSP Group FSP Group Structure FSP Group Structure FSP Group FSP

About FSP Group About FSP Group www.fortronsource.com FSP Group Structure FSP Group

SUMMARY History End of life CLI Services Security Considerations PowerShell

Provider Directory Advisory Group Meeting January 13, 2016 Welcome! Introductions Agenda

Next-Generation gTLD Registration Directory Service (RDS) to replace WHOIS ICANN57 F2F Meeting

=C=Fermilab Managed by Fermi Research Alliance, LLC for the U.S. Department of Energy Office of

ADHA S ecure Messaging Industry Offer Proposed T esting Process Details December 2019

Chapter 9: Name Services 9.1 Introduction 9.2 Name services and the DNS 9.3 Directory

DESIGN OF SCALABLE DIRECTORY SERVICE FOR FUTURE IoT APPLICATIONS Ved P. Kafle, Yusuke Fukushima,

8/24/2016 Transition from First Steps to Early Childhood Special Education ( ECSE) August 2 0

Sambuz

Useful Links

Newsletter

Mail Us

Galaxy Formation and Evolution: Hubbles Legacy ! achel somervi " e # $ utgers Universi %#