CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

OUTLINE Ceph ● Data services ● Block ● File ● Object ● Edge ● Future ● 2

UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE RGW RBD CEPHFS S3 and Swift Virtual block device Distributed network object storage with robust feature set file system LIBRADOS Low-level storage API RADOS Reliable, elastic, highly-available distributed storage layer with replication and erasure coding 3

RELEASE SCHEDULE WE ARE HERE Luminous Mimic Nautilus Octopus Aug 2017 May 2018 Feb 2019 Nov 2019 12.2.z 13.2.z 14.2.z 15.2.z ● Stable, named release every 9 months ● Backports for 2 releases ● Upgrade up to 2 releases at a time ● (e.g., Luminous → Nautilus, Mimic → Octopus) 4

FOUR CEPH PRIORITIES Usability and management Container ecosystem Performance Multi- and hybrid cloud 5

MOTIVATION - DATA SERVICES 6

A CLOUDY FUTURE IT organizations today ● Multiple private data centers ○ Multiple public cloud services ○ It’s getting cloudier ● “On premise” → private cloud ○ Self-service IT resources, provisioned on demand by developers and business units ○ Next generation of cloud-native applications will span clouds ● “Stateless microservices” are great, but real applications have state ● Managing moving or replicated state is hard ● 7

“DATA SERVICES” Data placement and portability ● Where should I store this data? ○ How can I move this data set to a new tier or new site? ○ Seamlessly, without interrupting applications? ○ Introspection ● What data am I storing? For whom? Where? For how long? ○ Search, metrics, insights ○ Policy-driven data management ● Lifecycle management ○ Compliance: constrain placement, retention, etc. (e.g., HIPAA, GDPR) ○ Optimize placement based on cost or performance ○ Automation ○ 8

MORE THAN JUST DATA Data sets are tied to applications ● When the data moves, the application often should (or must) move too ○ Container platforms are key ● Automated application (re)provisioning ○ “Operators” to manage coordinated migration of state and the applications that consume it ○ 9

DATA USE SCENARIOS Multi-tier ● Different storage for different data ○ Mobility ● Move an application and its data between sites with minimal (or no) availability interruption ○ Maybe an entire site, but usually a small piece of a site (e.g., a single app) ○ Disaster recovery ● Tolerate a complete site failure; reinstantiate data and app in a secondary site quickly ○ Point-in-time consistency with bounded latency (bounded data loss on failover) ○ Stretch ● Tolerate site outage without compromising data availability ○ Synchronous replication (no data loss) or async replication (different consistency model) ○ Edge ● Small satellite (e.g., telco POP) and/or semi-connected sites (e.g., autonomous vehicle) ○ 10

SYNC VS ASYNC Synchronous replication Asynchronous replication Applications initiates a write Application initiates a write ● ● Storage writes to all replicas Storage writes to one (or some) replicas ● ● Application write completes Application write completes ● ● Storage writes to remaining (usually ● remote) replicas later Write latency may be high since we wait ● Write latency can be kept low ● for all replicas If initial replicas are lost, application write ● All replicas always reflect applications’ ● may be lost completed writes Remote replicas may always be somewhat ● stale 11

BLOCK STORAGE 12

HOW WE USE BLOCK Virtual disk device ● Exclusive access by nature (with few exceptions) ● Strong consistency required ● Performance sensitive ● Basic feature set ● Applications Read, write, flush, maybe resize ○ Snapshots (read-only) or clones (read/write) ○ XFS, ext4, whatever Point-in-time consistent ■ Often self-service provisioning ● Block device via Cinder in OpenStack ○ via Persistent Volume (PV) abstraction in Kubernetes ○ 13

RBD - TIERING WITH RADOS POOLS Multi-tier ✓ Mobility ❏ DR ❏ KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 14

RBD - LIVE IMAGE MIGRATION Multi-tier ✓ New in Nautilus ● Mobility ✓ DR ❏ librbd only ● KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 15

RBD - STRETCH Multi-tier ❏ Apps can move ● Mobility ❏ DR ✓ Data can’t - it’s already everywhere ● Stretch FS ✓ Performance is usually compromised ● Edge ❏ KRBD Need fat and low latency pipes ○ SITE A SITE B STRETCH POOL STRETCH CEPH STORAGE CLUSTER WAN link 16

RBD - STRETCH WITH TIERS Multi-tier ✓ Create site-local pools for performance ● Mobility ❏ DR ✓ sensitive apps Stretch FS ✓ Edge ❏ KRBD SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 17

RBD - STRETCH WITH MIGRATION KVM Multi-tier ✓ Live migrate images between pools ● Mobility ✓ DR ✓ Maybe even live migrate your app VM? ● FS Stretch ✓ librbd Edge ❏ SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 18

STRETCH IS SKETCH Network latency is critical ● Want low latency for performance ○ Stretch requires nearby sites, limiting usefulness ○ Bandwidth too ● Must be able to sustain rebuild data rates ○ Relatively inflexible ● Single cluster spans all locations; maybe ok for 2 ○ datacenters but not 10? Cannot “join” existing clusters ○ High level of coupling ● Single (software) failure domain for all sites ○ Proceed with caution! ● 19

RBD ASYNC MIRRORING Asynchronously mirror all writes ● Some performance overhead at primary ● KVM Mitigate with SSD pool for RBD journal ○ Configurable time delay for backup ● FS librbd Supported since Luminous ● WAN link PRIMARY BACKUP Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 20

RBD ASYNC MIRRORING Multi-tier ❏ On primary failure ● Mobility ❏ DR ✓ Backup is point-in-time consistent ○ KVM Stretch ❏ Lose only last few seconds of writes ○ Edge ❏ VM/pod/whatever can restart in new site ○ FS If primary recovers, ● librbd Option to resync and “fail back” ○ WAN link DIVERGENT PRIMARY Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 21

RBD MIRRORING IN OPENSTACK CINDER Ocata Gaps ● ● Cinder RBD replication driver Deployment and configuration tooling ○ ○ Queens Cannot replicate multi-attach volumes ● ○ Nova attachments are lost on failover ○ ceph-ansible deployment of rbd-mirror via ○ TripleO Rocky ● Failover and fail-back operations ○ 22

MISSING LINK: APPLICATION ORCHESTRATION Hard for IaaS layer to reprovision app in new site ● Storage layer can’t solve it on its own either ● Need automated, declarative, structured specification for entire app stack... ● 23

FILE STORAGE 24

CEPHFS STATUS Multi-tier ✓ Stable since Kraken ● Mobility ❏ DR ❏ Multi-MDS stable since Luminous ● Stretch ❏ Edge ❏ Snapshots stable since Mimic ● Support for multiple RADOS data pools ● Per-directory subtree policies for placement, striping, etc. ○ Fast, highly scalable ● Quota, multi-volumes, multi-subvolume ● Provisioning via OpenStack Manila and Kubernetes ● Fully awesome ● 25

CEPHFS CLIENT HOST or ceph-fuse, Samba, CEPH KERNEL MODULE nfs-ganesha metadata data 01 10 M M M RADOS CLUSTER 26

CEPHFS - STRETCH? Multi-tier ❏ We can stretch CephFS just like RBD pools ● Mobility ❏ DR ✓ It has the same limitations as RBD ● Stretch ✓ Edge ❏ Latency → lower performance ○ Limited by geography ○ Big (software) failure domain ○ Also, ● MDS latency is critical for file workloads ○ ceph-mds daemons will run in one site; clients in other sites will see higher latency ○ 27

CEPHFS - FUTURE OPTIONS What can we do with CephFS across sites and clusters? ● 28

CEPHFS - SNAP MIRRORING? SITE A SITE B Multi-tier ❏ CephFS snapshots provide ● Mobility ❏ DR ✓ point-in-time consistency ○ 1. A: create snap S1 Stretch ❏ S1 S1 granularity (any directory in the system) ○ Edge 2. rsync A→B ❏ 3. B: create snap S1 CephFS rstats provide ● rctime = recursive ctime on any directory ○ 4. A: create snap S2 S2 S2 We can efficiently find changes ○ 5. rsync A→B time 6. B: create S2 rsync provides ● efficient file transfer ○ 7. A: create snap S3 S3 Time bounds on order of minutes ● 8. rsync A→B Gaps and TODO 9. B: create S3 ● “rstat flush” coming in Nautilus ○ Xuehan Xu @ Qihoo 360 ■ rsync support for CephFS rctime ○ scripting / tooling ○ easy rollback interface ○ Matches enterprise storage feature sets ● 29

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02 OUTLINE Ceph Data services Block File Object Edge Future 2 UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017 1 WHAT IS CEPH?

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH?

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

Agenda Openstack CEPH Storage Dream team: CEPH and Openstack Summary GUUG FFG 2015

BLUESTORE: A NEW STORAGE BACKEND FOR CEPH ONE YEAR IN SAGE WEIL 2017.03.23 OUTLINE Ceph

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

Presentation: 1. I-Max Ceph key points 2. Exams 3. Dimensions 4. Technical features 5. I-Max

Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Ceph storage with Rook Running Ceph on Kubernetes Alexander Trost, Rook Maintainer and DevOps

How to backup Ceph at scale FOSDEM, Brussels, 2018.02.04 About me Bartomiej wicki OVH

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit -

an intro to ceph and big data patrick mcgarry inktank Big Data Workshop 27 JUN 2013 what

An Algorithm for Determining the Endpoints for Isolated Intro to problem Solution

Time Complexity [Turing] has for the first time succeeded in giving an absolute definition of an

Outline Query Processing Overview Algorithms for basic operations Sorting Selection

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

The Wasserstein GAN Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University

Computing similarity between multiscale biological systems under uncertainty Kris Ghosh Miami

Sambuz

Useful Links

Newsletter

Mail Us

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02 OUTLINE Ceph Data services Block File Object Edge Future 2 UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer &lt;lgrimmer@suse.com&gt; |

Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017 1 WHAT IS CEPH?

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH?

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

Agenda Openstack CEPH Storage Dream team: CEPH and Openstack Summary GUUG FFG 2015

BLUESTORE: A NEW STORAGE BACKEND FOR CEPH ONE YEAR IN SAGE WEIL 2017.03.23 OUTLINE Ceph

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

Presentation: 1. I-Max Ceph key points 2. Exams 3. Dimensions 4. Technical features 5. I-Max

Ceph &amp; RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Ceph storage with Rook Running Ceph on Kubernetes Alexander Trost, Rook Maintainer and DevOps

How to backup Ceph at scale FOSDEM, Brussels, 2018.02.04 About me Bartomiej wicki OVH

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit -

an intro to ceph and big data patrick mcgarry inktank Big Data Workshop 27 JUN 2013 what

An Algorithm for Determining the Endpoints for Isolated Intro to problem Solution

Time Complexity [Turing] has for the first time succeeded in giving an absolute definition of an

Outline Query Processing Overview Algorithms for basic operations Sorting Selection

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

The Wasserstein GAN Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

Variants and Combinations of Basic Models Stefano Ermon, Aditya Grover Stanford University

Computing similarity between multiscale biological systems under uncertainty Kris Ghosh Miami

Sambuz

Useful Links

Newsletter

Mail Us

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3