Ceph: A Scalable, High-Performance Distributed File System Sage A. - PowerPoint PPT Presentation

Ceph: A Scalable, High-Performance Distributed File System Sage A. Weil, Scott A. Brandt, Ethan L. Milner, Darrel D. E. Long Presenter: Md Rajib Hossen

Ceph- A single, Open, and Unified platform  Horizontally scalable  Interoperability  No single point of failure  Workload include tens of thousands of client concurrently read/write to same file/directory  Handle allocation and mapping with dynamic algorithm-CRUSH  Enhanced local disk with object storage device(OSD)

Architecture Components  MDS-metadata server: perform file operations(open, rename), manage namespace, ensure consistency, security, and safety  OSD-Object Storage Device: store files, maintain replication, data serialization, recovery  Client: Support three different client types: Object, Block and Posix File System  Monitors: keep track of active and failed cluster nodes  Store files as object in storage level, stripped files into several objects. object size, stripe width, and stripe count are configurable.  CRUSH: removes allocation tables, dynamically maps object(strip of files) to storage devices, retrieve location of objects, load balance among the nodes.

Design Features Ceph provides scalability as well as high performance, reliability, and availibility.  To achieve these, ceph has three design features  Decoupled Data and metadata  Metadata operations(open, rename) managed by MDS, and OSD perform file I/O  Moreover, CRUSH distribute file objects to storage devices algorithmically  Dynamic Distributed Metadata Management  Uses Dynamic Subtree partitioning to distribute responsibilities among several MDS.  Dynamic hierarchical partition also preserve locality, and distribution is based on access pattern  Reliable Autonomic Distributed Object Storage  Delegates responsibilities to OSD and give intelligence to OSD to utilize Memory and CPU  Q3. “ Ceph directly addresses the issue of scalability while simultaneously achieving high  performance, reliability and availability through three fundamental design features:… ” What are the Ceph’s design features? Compare Figure 1 with “ Figure 1: GFS Architecture ” in the GFS’s paper, read Section 2 and indicate the fundamental differences between them? [Hint: “ …Figure 1: GFS Architecture”, “ Ceph utilizes a novel metadata cluster architecture… ”, “ Ceph delegates responsibility for data migration, replication,failure detection, and failure recovery to the cluster of OSDs… ”]

Q3. “ Ceph directly addresses the issue of scalability while simultaneously achieving high performance, reliability and availability through three fundamental design features:… ” What are the Ceph’s design features? Compare Figure 1 with “ Figure 1: GFS Architecture ” in the GFS’s paper, read Section 2 and indicate the fundamental differences between them? [Hint: “ …Figure 1: GFS Architecture”, “ Ceph utilizes a novel metadata cluster architecture… ”, “ Ceph delegates responsibility for data migration, replication, failure detection, and failure recovery to the cluster of OSDs… ”]

Q3. “ Ceph directly addresses the issue of scalability while simultaneously achieving high performance, reliability and availability through three fundamental design features:… ” What are the Ceph’s design features? Compare Figure 1 with “ Figure 1: GFS Architecture ” in the GFS’s paper, read Section 2 and indicate the fundamental differences between them? [Hint: “ …Figure 1: GFS Architecture”, “ Ceph utilizes a novel metadata cluster architecture… ”, “ Ceph delegates responsibility for data migration, replication,failure detection, and failure recovery to the cluster of OSDs… ”]  GFS has single master to coordinate and maintain all works except whereas ceph has metadata cluster  Ceph distribute replication, failure detection to osd whereas gfs master manage these tasks.  GFS use fixed size chunk i.e. 64MB whereas ceph has variable object size.  GFS use file mapping table and kept them in memory whereas ceph uses CRUSH  GFS depends on local file system whereas Ceph make intelligent OSD on top of local fs/customize fs  Ceph doesn’t require metadata locks or leases to clients whereas GFS does.

OSD Replaced traditional hard disk with intelligent object storage device  C lient can read/write continuously in OSD which isn’t present on traditional HDD  Client can perform continuous reading/writing of large and variable size object  Object sizes are configurable(2MB, 4MB, etc)  OSD distribute low-level block allocation decisions to devices  Moreover, reliance on traditional fs principles i.e. allocation lists and inode table limited  scalability and performance. Intelligence present on OSD can utilize CPU and memory in storage nodes  Q1. OSDs replace the traditional block-level interface with one in which clients can read or  write byte ranges to much larger (and often variably sized) named objects, distributing low-level block allocation decisions to the devices themselves. ” What are the major differences between OSD (Object Storage Device) and conventional hard disk?

Distributed Workload  Ceph delegates some responsibility to osd and reduce dependency on MDS  MDS manage file system namespace and file operations  OSD performs data access, data serialization, replication, reliability  It also removes allocation table and provide CRUSH algorithm to dynamically mapping between object and storage device

Distributed Workload  On the other hand, GFS have file mapping information stored in memory of Master.  Master maintains file system metadata i.e. namespaces, files to chunk mapping, location of replicas.  It also perform chunk lease management, garbage collection, chunk replication and migrations,  “ Ceph decouples data and metadata operations by eliminating file allocation tables and replacing them with generating functions. This allows Ceph to leverage the intelligence present in OSDs to distribute the complexity surrounding data access, update serialization, replication and reliability, failure detection, and recovery .” Does GFS have a file allocation table? Who is responsible for managing “ data access, update serialization, replication and reliability, failure detection, and recovery ” in GFS ?

Distributed Object Storage  To store a named object, client only need pool id and object id  Ceph takes these & hashes  Ceph calculate the hash modulo the # of PGs(e.g. 58) to get PG ID  Ceph gets the POOL ID given pool name(e.e . ” juventus ” = 4)  Ceph prepends the pool id to get PG ID (e.g. 4.58)

Find Data  First file is stripped into several objects  Maps objects into PGs using a hash function and adjustable bit mask to control # of PGs  Each osd to the order of 100 PGs for balance in osd utilization  PGs are then mapped to OSD via CRUSH  To locate an object, CRUSH requires PG ID and cluster map  Q4. “ Figure 3: Files are striped across many objects, grouped into placement groups (PGs), and distributed to OSDs via CRUSH, a specialized replica placement function. ”. Describe how to find the data associated with an inode and an in- file object number (“ ino , ono”).

Distributed Object Storage  CRUSH is introduced to remove mapping table that requires significant memory and overhead to keep the list consistent  Moreover, any entity can calculate object location and map can be updated infrequently.  Mapping relying on block or object list metadata have several drawbacks  We need to exchange distribution related metadata  Upon removal of node, we need to make the block/object list consistent and make several changes.  But now, pg will be mapped with new osd, and we will find pg id dynamically  Same approach also help in data rebalancing and new osd node.  These remove the dependency on underlying storage node  Q5. Does a mapping method (from an object number to its hosting storage server) relying on “ block or object list metadata ” (a table listing all object - server mappings) work as well? What’s its drawback?

Placement Group  PG aggregates series of object into a group, and maps them to series of OSD  Tracking per-object placement and metadata is highly expensive  PG reduce # of process and per object metadata to track when storing and retrieving data  There are other advantage of having logical placement group on top of osd cluster  Can apply placement rules on some specific PGs that belong to pool  Easy to provide distribution policy like SSD group/HDD group/ Same RACK/Different Rack  OSD can self-report and monitor peers within its same PG which reduce load from master  Mapping oid directly to OSD will lose these benefits  Q6. Why are placement groups (PGs) introduced? Can we construct a hash function mapping an object (“ oid ”) directly to a list of OSDs?

CRUSH Function  Determines how to store and retrieve data by computing data storage locations  CRUSH maps takes placement group, cluster map and placement rules as input  It produce list of osd to maps each pg  CRUSH map also consider placement constriant i.e. place each pg on osd such that it can reduce inter-row replication traffic and minimal exposure to power/switch failure.  Q7. What are inputs of a CRUSH hash function? What can be included in an OSD cluster map? [Hint: read the last paragraph Section 5.1 for the second question.]

Ceph: A Scalable, High-Performance Distributed File System Sage A. - PowerPoint PPT Presentation

Ceph: A Scalable, High-Performance Distributed File System Sage A. Weil, Scott A. Brandt, Ethan L. Milner, Darrel D. E. Long Presenter: Md Rajib Hossen Ceph- A single, Open, and Unified platform Horizontally scalable Interoperability

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017 1 WHAT IS CEPH?

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH?

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

BLUESTORE: A NEW STORAGE BACKEND FOR CEPH ONE YEAR IN SAGE WEIL 2017.03.23 OUTLINE Ceph

Agenda Openstack CEPH Storage Dream team: CEPH and Openstack Summary GUUG FFG 2015

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

How to backup Ceph at scale FOSDEM, Brussels, 2018.02.04 About me Bartomiej wicki OVH

Presentation: 1. I-Max Ceph key points 2. Exams 3. Dimensions 4. Technical features 5. I-Max

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Ceph storage with Rook Running Ceph on Kubernetes Alexander Trost, Rook Maintainer and DevOps

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander

CS 5412/LECTURE 24. CEPH: A Ken Birman SCALABLE HIGH-PERFORMANCE Spring, 2019 DISTRIBUTED FILE

CS 5412/LECTURE 13. Ken Birman CEPH: A SCALABLE HIGH-PERFORMANCE Spring, 2020 DISTRIBUTED FILE

Disk Formation in Magnetized Dense Cores with Turbulence and Ambipolar Diffusion March 21, 2019

LO-PH Low-Observable Physical Host Instrumentation for Malware Analysis Chad Spensky ,

Virtual Memory Philipp Koehn 25 April 2018 Philipp Koehn Computer Systems Fundamentals: Virtual

Chapter 12: Disks CS 416: Operating Systems Design Department of Computer Science Rutgers

Why Is This Important? DB performance depends on time it takes to get the data from storage

Live Disk Forensics on Bare Metal Hongyi Hu and Chad Spensky {hongyi.hu,chad.spensky}@ll.mit.edu

Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching

A disk origin for inner stellar halo structures around the Milky Way Adrian Price-Whelan

Ceph: A Scalable, High-Performance Distributed File System Sage A. - PowerPoint PPT Presentation

Ceph: A Scalable, High-Performance Distributed File System Sage A. Weil, Scott A. Brandt, Ethan L. Milner, Darrel D. E. Long Presenter: Md Rajib Hossen Ceph- A single, Open, and Unified platform Horizontally scalable Interoperability

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer &lt;lgrimmer@suse.com&gt; |

Ceph Rados Block Device Venky Shankar Ceph Developer, Red Hat SNIA, 2017 1 WHAT IS CEPH?

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH?

Linux Open Source Distributed Filesystem Ceph at SURFsara Remco van Vugt July 2, 2013 1/ 34

BLUESTORE: A NEW STORAGE BACKEND FOR CEPH ONE YEAR IN SAGE WEIL 2017.03.23 OUTLINE Ceph

Agenda Openstack CEPH Storage Dream team: CEPH and Openstack Summary GUUG FFG 2015

CEPH WIRE PROTOCOL REVISITED CEPH WIRE PROTOCOL REVISITED MESSENGER V2 MESSENGER V2 Ricardo

How to backup Ceph at scale FOSDEM, Brussels, 2018.02.04 About me Bartomiej wicki OVH

Presentation: 1. I-Max Ceph key points 2. Exams 3. Dimensions 4. Technical features 5. I-Max

Ceph: All-in-One Network Data Storage What is Ceph and how we use it to backend the Arbutus cloud

Ceph &amp; RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Ceph storage with Rook Running Ceph on Kubernetes Alexander Trost, Rook Maintainer and DevOps

Know more about your Ceph Cluster with ELK Stack Cameron Seader Technology Strategist

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander

CS 5412/LECTURE 24. CEPH: A Ken Birman SCALABLE HIGH-PERFORMANCE Spring, 2019 DISTRIBUTED FILE

CS 5412/LECTURE 13. Ken Birman CEPH: A SCALABLE HIGH-PERFORMANCE Spring, 2020 DISTRIBUTED FILE

Disk Formation in Magnetized Dense Cores with Turbulence and Ambipolar Diffusion March 21, 2019

LO-PH Low-Observable Physical Host Instrumentation for Malware Analysis Chad Spensky ,

Virtual Memory Philipp Koehn 25 April 2018 Philipp Koehn Computer Systems Fundamentals: Virtual

Chapter 12: Disks CS 416: Operating Systems Design Department of Computer Science Rutgers

Why Is This Important? DB performance depends on time it takes to get the data from storage

Live Disk Forensics on Bare Metal Hongyi Hu and Chad Spensky {hongyi.hu,chad.spensky}@ll.mit.edu

Scaling Your Cache &amp; Caching at Scale Alex Miller @puredanger Mission Why does caching

A disk origin for inner stellar halo structures around the Milky Way Adrian Price-Whelan

Managing and Monitoring Ceph with the Ceph Dashboard Lenz Grimmer <lgrimmer@suse.com> |

Ceph & RocksDB (Cloud Storage ) Ceph Basics Placement Group PG#1 PG#2 PG#3

Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching