ceph data services
play

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - - PowerPoint PPT Presentation

CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit - 2018.11.15 OUTLINE Ceph Data services Block File Object Edge Future 2 UNIFIED STORAGE PLATFORM OBJECT


  1. CEPH DATA SERVICES IN A MULTI- AND HYBRID CLOUD WORLD Sage Weil - Red Hat 1 OpenStack Summit - 2018.11.15

  2. OUTLINE Ceph ● Data services ● Block ● File ● Object ● Edge ● Future ● 2

  3. UNIFIED STORAGE PLATFORM OBJECT BLOCK FILE RGW RBD CEPHFS S3 and Swift Virtual block device Distributed network object storage with robust feature set file system LIBRADOS Low-level storage API RADOS Reliable, elastic, highly-available distributed storage layer with replication and erasure coding 3

  4. RELEASE SCHEDULE WE ARE HERE Luminous Mimic Nautilus Octopus Aug 2017 May 2018 Feb 2019 Nov 2019 12.2.z 13.2.z 14.2.z 15.2.z ● Stable, named release every 9 months ● Backports for 2 releases ● Upgrade up to 2 releases at a time ● (e.g., Luminous → Nautilus, Mimic → Octopus) 4

  5. FOUR CEPH PRIORITIES Usability and management Container platforms Performance Multi- and hybrid cloud 5

  6. MOTIVATION - DATA SERVICES 6

  7. A CLOUDY FUTURE IT organizations today ● Multiple private data centers ○ Multiple public cloud services ○ It’s getting cloudier ● “On premise” → private cloud ○ Self-service IT resources, provisioned on demand by developers and business units ○ Next generation of cloud-native applications will span clouds ● “Stateless microservices” are great, but real applications have state. ● 7

  8. DATA SERVICES Data placement and portability ● Where should I store this data? ○ How can I move this data set to a new tier or new site? ○ Seamlessly, without interrupting applications? ○ Introspection ● What data am I storing? For whom? Where? For how long? ○ Search, metrics, insights ○ Policy-driven data management ● Lifecycle management ○ Conformance: constrain placement, retention, etc. (e.g., HIPAA, GPDR) ○ Optimize placement based on cost or performance ○ Automation ○ 8

  9. MORE THAN JUST DATA Data sets are tied to applications ● When the data moves, the application often should (or must) move too ○ Container platforms are key ● Automated application (re)provisioning ○ “Operators” to manage coordinated migration of state and applications that consume it ○ 9

  10. DATA USE SCENARIOS Multi-tier ● Different storage for different data ○ Mobility ● Move an application and its data between sites with minimal (or no) availability interruption ○ Maybe an entire site, but usually a small piece of a site ○ Disaster recovery ● Tolerate a site-wide failure; reinstantiate data and app in a new site quickly ○ Point-in-time consistency with bounded latency (bounded data loss) ○ Stretch ● Tolerate site outage without compromising data availability ○ Synchronous replication (no data loss) or async replication (different consistency model) ○ Edge ● Small (e.g., telco POP) and/or semi-connected sites (e.g., autonomous vehicle) ○ 10

  11. BLOCK STORAGE 11

  12. HOW WE USE BLOCK Virtual disk device ● Exclusive access by nature (with few exceptions) ● Strong consistency required ● Performance sensitive ● Basic feature set ● Applications Read, write, flush, maybe resize ○ Snapshots (read-only) or clones (read/write) ○ XFS, ext4, whatever Point-in-time consistent ■ Often self-service provisioning ● Block device via Cinder in OpenStack ○ via Persistent Volume (PV) abstraction in Kubernetes ○ 12

  13. RBD - TIERING WITH RADOS POOLS Multi-tier ✓ Mobility ❏ DR ❏ KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 13

  14. RBD - LIVE IMAGE MIGRATION Multi-tier ✓ New in Nautilus ● Mobility ✓ DR ❏ KVM Stretch ❏ Edge ❏ FS FS KRBD librbd SSD 2x POOL HDD 3x POOL SSD EC 6+3 POOL CEPH STORAGE CLUSTER 14

  15. RBD - STRETCH Multi-tier ❏ Apps can move ● Mobility ❏ DR ✓ Data can’t - it’s everywhere ● Stretch FS ✓ Performance is compromised ● Edge ❏ KRBD Need fat and low latency pipes ○ SITE A SITE B STRETCH POOL STRETCH CEPH STORAGE CLUSTER WAN link 15

  16. RBD - STRETCH WITH TIERS Multi-tier ✓ Create site-local pools for performance ● Mobility ❏ DR ✓ sensitive apps Stretch FS ✓ Edge ❏ KRBD SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 16

  17. RBD - STRETCH WITH MIGRATION Multi-tier ✓ Live migrate images between pools ● Mobility ✓ DR ✓ Maybe even live migrate your app VM? ● Stretch FS ✓ Edge ❏ KRBD SITE A SITE B A POOL STRETCH POOL B POOL STRETCH CEPH STORAGE CLUSTER WAN link 17

  18. STRETCH IS SKETCH Network latency is critical ● Low latency for performance ○ Requires nearby sites, limiting usefulness ○ Bandwidth too ● Must be able to sustain rebuild data rates ○ Relatively inflexible ● Single cluster spans all locations ○ Cannot “join” existing clusters ○ High level of coupling ● Single (software) failure domain for all ○ sites 18

  19. RBD ASYNC MIRRORING Asynchronously mirror writes ● Small performance overhead at primary ● KVM Mitigate with SSD pool for RBD journal ○ Configurable time delay for backup ● FS librbd WAN link PRIMARY BACKUP Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 19

  20. RBD ASYNC MIRRORING Multi-tier ❏ On primary failure ● Mobility ❏ DR ✓ Backup is point-in-time consistent ○ KVM Stretch ❏ Lose only last few seconds of writes ○ Edge ❏ VM can restart in new site ○ FS If primary recovers, ● librbd Option to resync and “fail back” ○ WAN link DIVERGENT PRIMARY Asynchronous mirroring SSD 3x POOL HDD 3x POOL CEPH CLUSTER A CEPH CLUSTER B 20

  21. RBD MIRRORING IN CINDER Ocata Gaps ● ● Cinder RBD replication driver Deployment and configuration tooling ○ ○ Queens Cannot replicate multi-attach volumes ● ○ Nova attachments are lost on failover ○ ceph-ansible deployment of rbd-mirror via ○ TripleO Rocky ● Failover and fail-back operations ○ 21

  22. MISSING LINK: APPLICATION ORCHESTRATION Hard for IaaS layer to reprovision app in new site ● Storage layer can’t solve it on its own either ● Need automated, declarative, structured specification for entire app stack... ● 22

  23. FILE STORAGE 23

  24. CEPHFS STATUS Multi-tier ✓ Stable since Kraken ● Mobility ❏ DR ❏ Multi-MDS stable since Luminous ● Stretch ❏ Edge ❏ Snapshots stable since Mimic ● Support for multiple RADOS data pools ● Provisioning via OpenStack Manila and Kubernetes ● Fully awesome ● 24

  25. CEPHFS - STRETCH? Multi-tier ❏ We can stretch CephFS just like RBD pools ● Mobility ❏ DR ✓ It has the same limitations as RBD ● Stretch ✓ Edge ❏ Latency → lower performance ○ Limited by geography ○ Big (software) failure domain ○ Also, ● MDS latency is critical for file workloads ○ ceph-mds daemons be running in one site or another ○ What can we do with CephFS across multiple clusters? ● 25

  26. CEPHFS - SNAP MIRRORING SITE A SITE B Multi-tier ❏ CephFS snapshots provide ● Mobility ❏ DR ✓ point-in-time consistency ○ 1. A: create snap S1 Stretch ❏ S1 S1 granularity (any directory in the system) ○ Edge 2. rsync A→B ❏ 3. B: create snap S1 CephFS rstats provide ● rctime to efficiently find changes ○ 4. A: create snap S2 S2 S2 rsync provides ● 5. rsync A→B time 6. B: create S2 efficient file transfer ○ Time bounds on order of minutes ● 7. A: create snap S3 S3 Gaps and TODO ● 8. rsync A→B 9. B: create S3 “rstat flush” coming in Nautilus ○ Xuehan Xu @ Qihoo 360 ■ rsync support for CephFS rstats ○ scripting / tooling ○ 26

  27. DO WE NEED POINT-IN-TIME FOR FILE? Yes. ● Sometimes. ● Some geo-replication DR features are built on rsync... ● Consistent view of individual files, ○ Lack point-in-time consistency between files ○ Some (many?) applications are not picky about cross-file consistency... ● Content stores ○ Casual usage without multi-site modification of the same files ○ 27

  28. CASE IN POINT: HUMANS Many humans love Dropbox / NextCloud / etc. ● Ad hoc replication of directories to any computer ○ Archive of past revisions of every file ○ Offline access to files is extremely convenient and fast ○ Disconnected operation and asynchronous replication leads to conflicts ● Usually a pop-up in GUI ○ Automated conflict resolution is usually good enough ● e.g., newest timestamp wins ○ Humans are happy if they can rollback to archived revisions when necessary ○ A possible future direction: ● Focus less on avoiding/preventing conflicts… ○ Focus instead on ability to rollback to past revisions… ○ 28

  29. BACK TO APPLICATIONS Do we need point-in-time consistency for file systems? ● Where does the consistency requirement come in? ● 29

  30. MIGRATION: STOP, MOVE, START SITE A SITE B Multi-tier ❏ App runs in site A ● Mobility ✓ DR ❏ Stop app in site A ● Stretch ❏ Edge ❏ Copy data A→B ● Start app in site B ● time App maintains exclusive access ● Long service disruption ● 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend