hdf group
play

HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda - PowerPoint PPT Presentation

HDF Group ESRF September 2019 Kita-OIO SDS Integration Agenda September 2019 1 About OpenIO 2 OpenIO SDS + KITA 3 Demo 2 OpenIO OpenIO ID 3 Founded in 2015 Quickly growing across geographies and vertical markets 3 40+ 3 40


  1. HDF Group ESRF September 2019 Kita-OIO SDS Integration

  2. Agenda September 2019 1 About OpenIO 2 OpenIO SDS + KITA 3 Demo 2 OpenIO

  3. OpenIO ID 3

  4. Founded in 2015 Quickly growing across geographies and vertical markets 3 40+ 3 40 Continents Investors Large customers Employees Deployments from 3 HQ Mostly R&D, Support, nodes up to 40 Lille tech people Petabytes and tens of Growing fast billions of objects Offices Paris, Tokyo Teams across EMEA & Japan OpenIO selected as Cloud start-up to follow closely, Recognition & awards: March 2019 4 OpenIO

  5. OpenIO Vision and Mission Vision: We envision a data-centric world where OpenIO is recognized as the universal storage solution for unstructured data Mission: OpenIO’s mission is to deliver an open source, high performance object storage solution that meets the demanding needs of customers working with HPC, Big Data ​and AI OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

  6. OpenIO SDS and HDF Kita Jean-François Smigielski OpenIO- CTO 6

  7. Storage Landscape in HPC Network Attached Storage OIO-Object Storage Storage POSIX / File Cost effective-Scalable Area Medium-Latency Very High throughput Network Average Scalability Block Storage- Low Latency High Throughput Tape Hot, Mutable Cold, Warm, Parallel FS Offline copies Immutable High $/GB MPI-IO capable High Latencies Low $/GB High concurrency Very-Low Latency High Latency Very-High throughput Low Latency OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

  8. Why object storage to fill the gap? TCO! 1/ What is Object Storage 3/ How can it integrate? • • Unstructured Immutable Data + Metadata Hierarchically, behind a primary fast tier • • High Parallelism Independant tiers with data movements orchestrated • 100% Online Dataset • Cloud-oriented, ideal for large scale • De facto standards: S3 (AWS), Swift (Openstack) 2/ Meanwhile, in HPC... 4/ KITA , the necessary middleware • • S3/Swift are not standards, HDF5 & MPI-IO are Persist mutable datasets as immutable objects! • • HDF5 was not designed to work with objects Independant Object Storage, with HDF5 as an orchestrator (import / export) • Huge mutable datasets • Lower TCO would be appreciated 8 OpenIO

  9. Why OpenIO? We Think Different! ConsciousGrid TM Directory with Grid of nodes Open Source & HW technology indirections agnostic Track containers and not Avoid vendor lock-in and keep Never rebalance: Real-time load balancing control of your data objects: for optimal data Scale up and out in small or placement, more efficient large increments and on any Open source guarantees the than a ring-based hardware that you choose, continuity of the solution and architecture while maintaining consistent gives your engineers the high performance opportunity to understand how the tech works Being hardware agnostic allows better capacity planning and improve your TCO 9 OpenIO

  10. KITA/OIO Integration Architecture 1 0

  11. Kita's Architecture Data Access Options C/Fortran Web HDF Kita Client SDKs for Python and C are drop-in Applications Applications replacements for libraries used with local files. Community Browser Conventions Clients do not No significant code change to access local and REST Virtual know the details HDF5 Lib cloud based data. Object Layer of the data or the storage system S3 Virtual File Driver REST h5pyd h5py API Python Command Applications Line Tools OpenIO

  12. Kita / OIO Similar Architectures Service Node Data Node Persistence Service Node Data Node KITA Client Load Balancer Backend Service Node Data Node Service Node Data Node What if we just juxtapose both? ConsciousGrid Service Metadata Proxy Directory Service S3 Gateway Rawx Service S3 Gateway Rawx Service OIO-SDS Client Load Balancer S3 Gateway Rawx Service S3 Gateway Rawx Service OpenIO

  13. Step 1: Easy Integration Let's configure a single load-balanced endpoint for the persistence backend • Many redundant scalability patterns (caching, sharding, load-balancing) • Huge bandwidth usage: any stream is repeated • 2 autonomous clusters with deployment patterns • Stateless, with K8s or Docker Swarm for Kita • Stateful, with Ansible on bare-metal for OpenIO Barely acceptable for functional validation purposes, as a first step. 13 OpenIO

  14. Step 2: Tighter Integration Load Balancer Conscience Service Colocated, local network Registered in the Redirection Metadata Proxy Directory Service ConsciousGrid as a LB S3 Gateway Service Node Data Node Rawx Service Service Node Data Node S3 Gateway Rawx Service Client S3 Gateway Service Node Data Node Rawx Service Deployment S3 Gateway Rawx Service Service Node Data Node Unit OpenIO

  15. Kita / OIO: Benefits of a Tight Integration • Lower HW cost & cost/giga • Tightly coupled • Lower management cost one architecture cluster to manage rather than two! • Effort rts on • Improved management cycle pra ragmatism andparcimony Optimised Optimised performance TCO OpenSource Scalability solution • Scaling up increase • Both HDF and simultaneous clients OIO are born and parallelism for opensource HDF access • No vendor lock-in • De facto standards OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

  16. Kita / OIO: A (not so) Imaginary Use Case • A scientist visits a research facility and he/she • A scientist visits a research facility and he/she starts a new experiment on an existing run starts a new experiment on an existing run • The test happens, data are dumped on a fast • The test happens, data are dumped on a fast buffer storage (Parallel FS) buffer storage (Parallel FS) • Soon after, copies are made on the • Soon after, copies are made on the secondary secondary storage (Tape) storage (Private Cloud) and data is flushed from the buffer • Preliminary validations are performed on the data in the fast buffer • Preliminary validations are performed from the cloud • His/Her long stay comes to an end, he/she returns with a pile of BluRay/DVD • His/Her short stay comes to an end, he/she returns with credentials to the cloud. • The data is flushed from the buffer. Much smaller and cheaper Buffer needed! Better user experience! OpenIO

  17. Demo 1 7

  18. Want to learn more? Contacts – jean-francois.smigielski@openio.io – marielaure.retureau@openio.io – dax.rodriguez@hdfgroup.org Try out Kita for free in our JupyterLab environment: – See: http://www.hdfgroup.org/hdfkitalab/ Learn more about HDF Kita: https://www.hdfgroup.org/solutions/hdf-kita/ Learn more about OpenIO SDS: https://www.openio.io/product/product-overview OpenIO Cloudmark Confidential. Do not copy, repurpose, or distribute.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend