ceph data services
play

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02 OUTLINE Ceph Data services Block File Object Edge Future 2 UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE


  1. CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 FOSDEM - 2019.02.02

  2. OUTLINE Ceph ● Data services ● Block ● File ● Object ● Edge ● Future ● 2

  3. UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE RGW RBD CEPHFS S3 and Swift Virtual block device Distributed network object storage with robust feature set file system LIBRADOS Low-level storage API RADOS Reliable, elastic, highly-available distributed storage layer with replication and erasure coding 3

  4. RELEASE SCHEDULE WE ARE HERE Luminous Mimic Nautilus Octopus Aug 2017 May 2018 Feb 2019 Nov 2019 12.2.z 13.2.z 14.2.z 15.2.z ● Stable, named release every 9 months ● Backports for 2 releases ● Upgrade up to 2 releases at a time ● (e.g., Luminous → Nautilus, Mimic → Octopus) 4

  5. FOUR CEPH PRIORITIES Usability and management Container ecosystem Performance Multi- and hybrid cloud 5

  6. MOTIVATION - DATA SERVICES 6

  7. A CLOUDY FUTURE IT organizations today ● Multiple private data centers ○ Multiple public cloud services ○ It’s getting cloudier ● “On premise” → private cloud ○ Self-service IT resources, provisioned on demand by developers and business units ○ Next generation of cloud-native applications will span clouds ● “Stateless microservices” are great, but real applications have state ● Managing moving or replicated state is hard ● 7

  8. “DATA SERVICES” Data placement and portability ● Where should I store this data? ○ How can I move this data set to a new tier or new site? ○ Seamlessly, without interrupting applications? ○ Introspection ● What data am I storing? For whom? Where? For how long? ○ Search, metrics, insights ○ Policy-driven data management ● Lifecycle management ○ Compliance: constrain placement, retention, etc. (e.g., HIPAA, GDPR) ○ Optimize placement based on cost or performance ○ Automation ○ 8

  9. MORE THAN JUST DATA Data sets are tied to applications ● When the data moves, the application often should (or must) move too ○ Container platforms are key ● Automated application (re)provisioning ○ “Operators” to manage coordinated migration of state and the applications that consume it ○ 9

  10. DATA USE SCENARIOS Multi-tier ● Different storage for different data ○ Mobility ● Move an application and its data between sites with minimal (or no) availability interruption ○ Maybe an entire site, but usually a small piece of a site (e.g., a single app) ○ Disaster recovery ● Tolerate a complete site failure; reinstantiate data and app in a secondary site quickly ○ Point-in-time consistency with bounded latency (bounded data loss on failover) ○ Stretch ● Tolerate site outage without compromising data availability ○ Synchronous replication (no data loss) or async replication (different consistency model) ○ Edge ● Small satellite (e.g., telco POP) and/or semi-connected sites (e.g., autonomous vehicle) ○ 10

  11. SYNC VS ASYNC Synchronous replication Asynchronous replication Applications initiates a write Application initiates a write ● ● Storage writes to all replicas Storage writes to one (or some) replicas ● ● Application write completes Application write completes ● ● Storage writes to remaining (usually ● remote) replicas later Write latency may be high since we wait ● Write latency can be kept low ● for all replicas If initial replicas are lost, application write ● All replicas always reflect applications’ ● may be lost completed writes Remote replicas may always be somewhat ● stale 11

  12. BLOCK STORAGE 12

  13. HOW WE USE BLOCK Virtual disk device ● Exclusive access by nature (with few exceptions) ● Strong consistency required ● Performance sensitive ● Basic feature set ● Applications Read, write, flush, maybe resize ○ Snapshots (read-only) or clones (read/write) ○ XFS, ext4, whatever Point-in-time consistent ■ Often self-service provisioning ● Block device via Cinder in OpenStack ○ via Persistent Volume (PV) abstraction in Kubernetes ○ 13

  14. RBD - TIERING WITH RADOS POOLS Multi-tier ✓ Mobility ❏ DR ❏ KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 14

  15. RBD - LIVE IMAGE MIGRATION Multi-tier ✓ New in Nautilus ● Mobility ✓ DR ❏ librbd only ● KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 15

  16. RBD - STRETCH Multi-tier ❏ Apps can move ● Mobility ❏ DR ✓ Data can’t - it’s already everywhere ● Stretch FS ✓ Performance is usually compromised ● Edge ❏ KRBD Need fat and low latency pipes ○ SITE A SITE B STRETCH POOL STRETCH CEPH STORAGE CLUSTER WAN link 16

  17. RBD - STRETCH WITH TIERS Multi-tier ✓ Create site-local pools for performance ● Mobility ❏ DR ✓ sensitive apps Stretch FS ✓ Edge ❏ KRBD SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 17

  18. RBD - STRETCH WITH MIGRATION KVM Multi-tier ✓ Live migrate images between pools ● Mobility ✓ DR ✓ Maybe even live migrate your app VM? ● FS Stretch ✓ librbd Edge ❏ SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 18

  19. STRETCH IS SKETCH Network latency is critical ● Want low latency for performance ○ Stretch requires nearby sites, limiting usefulness ○ Bandwidth too ● Must be able to sustain rebuild data rates ○ Relatively inflexible ● Single cluster spans all locations; maybe ok for 2 ○ datacenters but not 10? Cannot “join” existing clusters ○ High level of coupling ● Single (software) failure domain for all sites ○ Proceed with caution! ● 19

  20. RBD ASYNC MIRRORING Asynchronously mirror all writes ● Some performance overhead at primary ● KVM Mitigate with SSD pool for RBD journal ○ Configurable time delay for backup ● FS librbd Supported since Luminous ● WAN link PRIMARY BACKUP Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 20

  21. RBD ASYNC MIRRORING Multi-tier ❏ On primary failure ● Mobility ❏ DR ✓ Backup is point-in-time consistent ○ KVM Stretch ❏ Lose only last few seconds of writes ○ Edge ❏ VM/pod/whatever can restart in new site ○ FS If primary recovers, ● librbd Option to resync and “fail back” ○ WAN link DIVERGENT PRIMARY Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 21

  22. RBD MIRRORING IN OPENSTACK CINDER Ocata Gaps ● ● Cinder RBD replication driver Deployment and configuration tooling ○ ○ Queens Cannot replicate multi-attach volumes ● ○ Nova attachments are lost on failover ○ ceph-ansible deployment of rbd-mirror via ○ TripleO Rocky ● Failover and fail-back operations ○ 22

  23. MISSING LINK: APPLICATION ORCHESTRATION Hard for IaaS layer to reprovision app in new site ● Storage layer can’t solve it on its own either ● Need automated, declarative, structured specification for entire app stack... ● 23

  24. FILE STORAGE 24

  25. CEPHFS STATUS Multi-tier ✓ Stable since Kraken ● Mobility ❏ DR ❏ Multi-MDS stable since Luminous ● Stretch ❏ Edge ❏ Snapshots stable since Mimic ● Support for multiple RADOS data pools ● Per-directory subtree policies for placement, striping, etc. ○ Fast, highly scalable ● Quota, multi-volumes, multi-subvolume ● Provisioning via OpenStack Manila and Kubernetes ● Fully awesome ● 25

  26. CEPHFS CLIENT HOST or ceph-fuse, Samba, CEPH KERNEL MODULE nfs-ganesha metadata data 01 10 M M M RADOS CLUSTER 26

  27. CEPHFS - STRETCH? Multi-tier ❏ We can stretch CephFS just like RBD pools ● Mobility ❏ DR ✓ It has the same limitations as RBD ● Stretch ✓ Edge ❏ Latency → lower performance ○ Limited by geography ○ Big (software) failure domain ○ Also, ● MDS latency is critical for file workloads ○ ceph-mds daemons will run in one site; clients in other sites will see higher latency ○ 27

  28. CEPHFS - FUTURE OPTIONS What can we do with CephFS across sites and clusters? ● 28

  29. CEPHFS - SNAP MIRRORING? SITE A SITE B Multi-tier ❏ CephFS snapshots provide ● Mobility ❏ DR ✓ point-in-time consistency ○ 1. A: create snap S1 Stretch ❏ S1 S1 granularity (any directory in the system) ○ Edge 2. rsync A→B ❏ 3. B: create snap S1 CephFS rstats provide ● rctime = recursive ctime on any directory ○ 4. A: create snap S2 S2 S2 We can efficiently find changes ○ 5. rsync A→B time 6. B: create S2 rsync provides ● efficient file transfer ○ 7. A: create snap S3 S3 Time bounds on order of minutes ● 8. rsync A→B Gaps and TODO 9. B: create S3 ● “rstat flush” coming in Nautilus ○ Xuehan Xu @ Qihoo 360 ■ rsync support for CephFS rctime ○ scripting / tooling ○ easy rollback interface ○ Matches enterprise storage feature sets ● 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend