Geo replication and disaster recovery for cloud object storage with - PowerPoint PPT Presentation

Geo replication and disaster recovery for cloud object storage with Ceph rados gateway Orit Wasserman Senior Software engineer owasserm@redhat.com Vault 2017

AGENDA What is Ceph? • Rados Gateway (radosgw) architecture • Geo replication in radosgw • Questions •

Ceph architecture

Ceph Open source • Software defjned storage • Distributed • No single point of failure • Massively scalable • Self healing • Unifjed storage: object, block and fjle • IRC: OFTC #ceph,#ceph-devel • Mailing lists: ceph-users@ceph.com and ceph-devel@ceph.com •

Ceph APP HOST/VM CLIENT RGW RBD CEPHFS A reliable, fully- A distributed fjle system A web services distributed block device with POSIX semantics gateway for object with cloud platform and scale-out metadata storage, compatible integration management with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

Rados Reliable Distributed Object Storage • Replication • Erasure coding • Flat object namespace within each pool • Difgerent placement rules • Strong consistency (CP system) • Infrastructure aware, dynamic topology • Hash-based placement (CRUSH) • Direct client to server data path •

OSD node 10s to 1000s in a cluster • One per disk (or one per • SSD, RAID group…) Serve stored objects to • clients Intelligently peer for • replication & recovery

Monitor node Maintain cluster membership • and state Provide consensus for • distributed decision-making Small, odd number • These do not serve stored • objects to clients

Librados API Effjcient key/value storage inside an object • Atomic single-object transactions • update data, attr, keys together • atomic compare-and-swap • Object-granularity snapshot infrastructure • Partial overwrite of existing data • Single-object compound atomic operations • RADOS classes (stored procedures) • Watch/Notify on an object •

Rados Gateway

Rados Gateway APP HOST/VM CLIENT RGW RBD CEPHFS A web services A reliable, fully- A distributed fjle gateway for object distributed block system with POSIX storage, compatible device with cloud semantics and scale- with S3 and Swift platform integration out metadata management LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

Rados Gateway APPLICATION APPLICATION REST RADOSGW RADOSGW LIBRADOS LIBRADOS socket M M M RADOS CLUSTER

RESTful OBJECT STORAGE Data • Users • Buckets APPLICATION APPLICATION • Objects • SWIFT REST S3 REST ACLs • Authentication • RADOSGW APIs • LIBRADOS S3 • Swift • Librgw (used for • NFS) RADOS CLUSTER

RGW vs RADOS object RADOS • Limited object sizes • Mutable objects • Not indexed • No per-object ACLs • RGW • Large objects (Up to a few TB per object) • Immutable objects • Sorted bucket listing • Permissions •

RGW objects Head • Single rados object • Object metadata (acls, user attributes, • manifest) Optional start of data • T ail • Striped data • OBJECT 0 or more rados objects • HEAD TAIL

RGW Objects OBJECT: foo BUCKET: boo head head head 123_foo BUCKET ID: 123 tail 1 123_28faPd3Z.1 123_28faPd.1 tail 1 123_28faPd3Z.2

RGW bucket index BUCKET INDEX Shard 1 Shard 2 aaa aab abc bbb def (v2) eee def (v1) fff zzz zzz

RGW object creation When creating a new object we need to: • Update bucket index • Create head object • Create tail objects • All those operations need to be consist •

RGW object creation Write tail prepare TAIL aab complete bbb aab eee bbb fff (prepare) eee Write head zzz fff HEAD zzz

Geo replication

Geo replication Data is replicated on difgerent physical locations • High and unpredictable latency between those location •

Why do we need Geo replication? Disaster recovery • Distribute data across geographical • locations

Geo replication europe us-east us-west us-east us-west us-east us-west brazil singapore aus brazil europe aus aus primary singapore singapore dr backup

Sync agent (old implementation) CEPH OBJECT GATEWAY (RGW) CEPH STORAGE CLUSTER (US-EAST-1) SYNC AGENT CEPH OBJECT GATEWAY (RGW) CEPH STORAGE CLUSTER (US-EAST-2)

Sync agent (old implementation) External python implementation • No Active/Active support • Hard to confjgure • Complicate failover mechanism • No clear sync status indication • A single bucket synchronization could dominate the entire sync • process Confjguration updates require restart of the gateways •

New implementation part of the radosgw (written in c++) • Active/active support for data replication • Simpler confjguration • Simplify failover/failback • Dynamic reconfjguration • Backward compatibility with the sync agent •

Multisite confjguration Realm • Namespace • contains the multisite confjguration and status • Allows running difgerent confjgurations in the same cluster • Zonegroup • Group of zones • Used to be called region in old multisite • Each realm has a single master zonegroup • Zone • One or more Radosgw instances all running on the same Rados • cluster Each zonegroup has a single master zone •

Disaster recovery example Zonegroup: us (master) Zonegroup: us (master) Zone: us-west (secondary) Zone: us-east (master) RADOSGW RADOSGW CEPH STORAGE CEPH STORAGE CLUSTER CLUSTER (US-EAST) (US-WEST) Realm: Gold

Multisite environment example ZoneGroup: us (master) ZoneGroup: eu (secondary) Zone: us-east (master) Zone: eu-west (master) RADOSGW RADOSGW CEPH STORAGE CEPH STORAGE CLUSTER CLUSTER (US-EAST) (EU-WEST) RADOSGW CEPH STORAGE Zonegroup: us (master) Realm: Gold CLUSTER Zone: us-west (secondary) (US-WEST)

Confjguration change Period: • Each period has a unique id • Contains: all the realm confjguration, an epoch and it's predecessor period • id (except for the fjrst period) Every realm has an associated current period and a chronological list • of periods Git like mechanism: • User confjguration changes are stored locally • Confjguration updated are stored in a stagging period (using radosgw- • admin period update command) Changes are applied only when the period is commited (using radosgw- • admin period commit command)

Confjguration change – no master change Each zone can pull the period information (using radosgw- • admin period pull command) Period commit will only increment the period epoch. • The new period information will be pushed to all other zones • We use watch/notify on the realm Rados object to detect • changes on the local radosgw

Changing master zone Period commit will results in the following actions: • A new period is generated with a new period id and epoch of 1 • Realm's current period is updated to point to the newly generated • period id Realm's epoch is incremented • New period is pushed to all other zones by the new master • We use watch/notify on the realm rados object to detect • changes and apply them on the local radosgw

Sync process Metadata change operations: • Bucket ops (Create, Delete and enable/disable versioning , change • ACLS ...) Users ops • Data change operations: create/delete objects • Metadata changes have wide system efgect • Metadata changes are rare • Data change are large •

Metadata sync Metadata changes are replicated synchronously across the • realm Log for metadata changes ordered chronologically, stored • locally in each Ceph cluster Each realm has a single meta master, the master zone in the • master zonegroup Only the meta master can executes metadata changes • If the meta master is down the user cannot perform metadata • updates till a new meta master is assigned (cannot create/delete buckets or users but can read/write objects)

Metadata sync Metadata update: • Update metadata log on pending operation • Execute • Update metadata log on operation completion • Push the metadata log changes to all the remote zones • Each zone will pull the updated metadata log and apply • changes locally updates to metadata originating from a difgerent zone are • forwarded to the meta master

Data sync Data changes are replicated asynchronously (eventual • consistency) Default replication is Active/Active • User can confjgure a zone to be read only for Active/Passive • Data changes are execute locally and logged chronologically to • a data log. We fjrst complete a full sync and than continue doing an • incremental sync

Geo replication and disaster recovery for cloud object storage with - PowerPoint PPT Presentation

Geo replication and disaster recovery for cloud object storage with Ceph rados gateway Orit Wasserman Senior Software engineer owasserm@redhat.com Vault 2017 AGENDA What is Ceph? Rados Gateway (radosgw) architecture Geo

GEO & Disaster Risk Reduction James Norris GEO Secretariat GEO in numbers Overview of GEO

Geo replication and disaster recovery for cloud object storage with Ceph rados gateway Orit

HEALTH IT IN DISASTER RECOVERY Presenter: Alaina Lamphear HIT IN DISASTER RECOVERY HEALTH IT IN

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Consistency and Replication Chi Zhang czhang@cs.fiu.edu Object Replication (1) Organization of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Disaster Recovery . How to Create a Robust Disaster Recovery Plan. Todays agenda The

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Zerto Virtual Replication 4.5 Disaster Recovery Evolved Zerto provides enterprise-class, virtual

Fields of Geo-Data and Blockchain Done by : Nancy Abu Halemah Aisah al Qayem GEO DATA GEODATA

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

GEO & Disaster Risk Reduction James Norris GEO Secretariat CEOS WGDisasters, 12 th Meeting

Recovery in the Cloud Tampa Bay Technology Forum CIO / CTO Network January 20, 2011

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

Software processes that are: Incremental (small software releases with rapid cycles)

Inverse Optimization with Online Data and Multiobjectives: Models, Insights and Algorithms Bo

DA2PL2018 November 23, 2018 Khannoussi et al. (LATERAL) November 23, 2018 1 / 23 Outline

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #15: OPTIMIZER

Incremental Computation of Warranted Arguments in Dynamic Defeasible Argumentation: The Rule

PRACTICAL CONTROL FLOW INTEGRITY & RANDOMIZATION FOR BINARY EXECUTABLES Christos Tselas,

Inversion in optimal control. Principles and examples Nicolas Petit Centre Automatique et

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Control Part 4 Other control

Sambuz

Useful Links

Newsletter

Mail Us

Geo replication and disaster recovery for cloud object storage with - PowerPoint PPT Presentation

Geo replication and disaster recovery for cloud object storage with Ceph rados gateway Orit Wasserman Senior Software engineer owasserm@redhat.com Vault 2017 AGENDA What is Ceph? Rados Gateway (radosgw) architecture Geo

GEO &amp; Disaster Risk Reduction James Norris GEO Secretariat GEO in numbers Overview of GEO

Geo replication and disaster recovery for cloud object storage with Ceph rados gateway Orit

HEALTH IT IN DISASTER RECOVERY Presenter: Alaina Lamphear HIT IN DISASTER RECOVERY HEALTH IT IN

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Consistency and Replication Chi Zhang czhang@cs.fiu.edu Object Replication (1) Organization of

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Disaster Recovery . How to Create a Robust Disaster Recovery Plan. Todays agenda The

MySQL Replication Tutorial Mats Kindahl Senior Software Engineer Replication Technology Lars

August 23, 2012 Data Replication/ETL: Terms Data Replication : Data Replication is the process of

Zerto Virtual Replication 4.5 Disaster Recovery Evolved Zerto provides enterprise-class, virtual

Fields of Geo-Data and Blockchain Done by : Nancy Abu Halemah Aisah al Qayem GEO DATA GEODATA

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

GEO &amp; Disaster Risk Reduction James Norris GEO Secretariat CEOS WGDisasters, 12 th Meeting

Recovery in the Cloud Tampa Bay Technology Forum CIO / CTO Network January 20, 2011

New features in MySQL Replication Lars Thalmann, Development Manager, Replication &amp; Backup

Software processes that are: Incremental (small software releases with rapid cycles)

Inverse Optimization with Online Data and Multiobjectives: Models, Insights and Algorithms Bo

DA2PL2018 November 23, 2018 Khannoussi et al. (LATERAL) November 23, 2018 1 / 23 Outline

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #15: OPTIMIZER

Incremental Computation of Warranted Arguments in Dynamic Defeasible Argumentation: The Rule

PRACTICAL CONTROL FLOW INTEGRITY &amp; RANDOMIZATION FOR BINARY EXECUTABLES Christos Tselas,

Inversion in optimal control. Principles and examples Nicolas Petit Centre Automatique et

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Control Part 4 Other control

Sambuz

Useful Links

Newsletter

Mail Us

GEO & Disaster Risk Reduction James Norris GEO Secretariat GEO in numbers Overview of GEO

GEO & Disaster Risk Reduction James Norris GEO Secretariat CEOS WGDisasters, 12 th Meeting

New features in MySQL Replication Lars Thalmann, Development Manager, Replication & Backup

PRACTICAL CONTROL FLOW INTEGRITY & RANDOMIZATION FOR BINARY EXECUTABLES Christos Tselas,