CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH - - PowerPoint PPT Presentation

cephalopods and samba
SMART_READER_LITE
LIVE PREVIEW

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH - - PowerPoint PPT Presentation

CEPHALOPODS AND SAMBA IRA COOPER - SambaXP 2016.05.12 AGENDA CEPH Architecture. Why CEPH? RADOS RGW CEPHFS Current Samba integration with CEPH. Future directions. Maybe a demo? 2 CEPH MOTIVATING


slide-1
SLIDE 1

CEPHALOPODS AND SAMBA

IRA COOPER - SambaXP 2016.05.12

slide-2
SLIDE 2

2

AGENDA

  • CEPH Architecture.

Why CEPH?

RADOS

RGW

CEPHFS

  • Current Samba integration with CEPH.
  • Future directions.
  • Maybe a demo?
slide-3
SLIDE 3

3

CEPH MOTIVATING PRINCIPLES

  • All components must scale horizontally.
  • There can be no single point of failure.
  • The solution must be hardware agnostic.
  • Should use commodity hardware.
  • Self-manage whenever possible.
  • Open source.
slide-4
SLIDE 4

4

ARCHITECTURAL COMPONENTS

RGW

A web services gateway for object storage, compatible with S3 and Swift

LIBRADOS

A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS

A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD

A reliable, fully- distributed block device with cloud platform integration

CEPHFS

A distributed fjle system with POSIX semantics and scale-

  • ut metadata

management

APP HOST/VM CLIENT

slide-5
SLIDE 5

5

ARCHITECTURAL COMPONENTS

LIBRADOS

A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RBD

A reliable, fully- distributed block device with cloud platform integration

CEPHFS

A distributed fjle system with POSIX semantics and scale-

  • ut metadata

management

APP HOST/VM CLIENT RGW

A web services gateway for object storage, compatible with S3 and Swift

RADOS

A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

slide-6
SLIDE 6

6

RADOS

  • Flat object namespace within each pool
  • Rich object API (librados)

Bytes, attributes, key/value data

Partial overwrite of existing data

Single-object compound operations

RADOS classes (stored procedures)

  • Strong consistency (CP system)
  • Infrastructure aware, dynamic topology
  • Hash-based placement (CRUSH)
  • Direct client to server data path
slide-7
SLIDE 7

7

RADOS CLUSTER

APPLICATION

M M M M M RADOS CLUSTER

slide-8
SLIDE 8

8

OBJECT STORAGE DAEMONS

FS DISK OSD DISK OSD FS DISK OSD FS DISK OSD FS

xfs btrfs ext4

M M M

slide-9
SLIDE 9

9

ARCHITECTURAL COMPONENTS

RGW

A web services gateway for object storage, compatible with S3 and Swift

LIBRADOS

A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS

A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD

A reliable, fully- distributed block device with cloud platform integration

CEPHFS

A distributed fjle system with POSIX semantics and scale-

  • ut metadata

management

APP HOST/VM CLIENT

slide-10
SLIDE 10

10

RADOSGW MAKES RADOS WEBBY

RADOSGW:

  • REST-based object storage proxy
  • Uses RADOS to store objects
  • Stripes large RESTful objects across

many RADOS objects

  • API supports buckets, accounts
  • Usage accounting for billing
  • Compatible with S3 and Swift applications
slide-11
SLIDE 11

11

THE RADOS GATEWAY

M M M RADOS CLUSTER

RADOSGW

LIBRADOS

socket

RADOSGW

LIBRADOS

APPLICATION APPLICATION

REST

slide-12
SLIDE 12

12

MULTI-SITE OBJECT STORAGE

WEB APPLICATION

APP SERVER

CEPH OBJECT GATEWAY (RGW)

CEPH STORAGE CLUSTER (US-EAST)

WEB APPLICATION

APP SERVER

CEPH OBJECT GATEWAY (RGW)

CEPH STORAGE CLUSTER (EU-WEST)

slide-13
SLIDE 13

13

FEDERATED RGW

  • Zones and regions

T

  • pologies similar to S3 and others

Global bucket and user/account namespace

  • Cross data center synchronization

Asynchronously replicate buckets between regions

  • Read affjnity

Serve local data from local DC

Dynamic DNS to send clients to closest DC

slide-14
SLIDE 14

14

ARCHITECTURAL COMPONENTS

RGW

A web services gateway for object storage, compatible with S3 and Swift

LIBRADOS

A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS

A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD

A reliable, fully- distributed block device with cloud platform integration

CEPHFS

A distributed fjle system with POSIX semantics and scale-

  • ut metadata

management

APP HOST/VM CLIENT

slide-15
SLIDE 15

15

SEPARATE METADATA SERVER

LINUX HOST

M M M RADOS CLUSTER

KERNEL MODULE

data metadata

01 10

slide-16
SLIDE 16

16

SCALABLE METADATA SERVERS

METADATA SERVER

  • Manages metadata for a POSIX-compliant

shared fjlesystem

  • Directory hierarchy
  • File metadata (owner, timestamps,

mode, etc.)

  • Clients stripe fjle data in RADOS
  • MDS not in data path
  • MDS stores metadata in RADOS
  • Key/value objects
  • Dynamic cluster scales to 10s or 100s
  • Only required for shared fjlesystem
slide-17
SLIDE 17

SAMBA - TODAY

slide-18
SLIDE 18

18

ARCHITECTURAL COMPONENTS

RGW

A web services gateway for object storage, compatible with S3 and Swift

LIBRADOS

A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOS

A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

RBD

A reliable, fully- distributed block device with cloud platform integration

CEPHFS

A distributed fjle system with POSIX semantics and scale-

  • ut metadata

management

APP HOST/VM SAMBA CLIENT

slide-19
SLIDE 19

19

SAMBA INTEGRATION

  • vfs_ceph

Since 2013.

Used as the outline for vfs_glusterfs

Been in testing in teuthology for a while now.

  • But not clustered :(.
  • ACL Integration?

Patchset from Zheng Yan, still needs more work.

Work on RichACLs is on going.

slide-20
SLIDE 20

20

CTDB INTEGRATION

  • fcntl locks

Does any fjlesystem get this right at the start.

0/2 so far.

Ceph's have been fjxed, they work for CTDB.

  • If you tweak the time outs.

– But these tweaks aren't production ready!

  • Both kernel and FUSE clients have been tested

Ceph team recommends ceph_fuse for now.

That's what the demo uses...

slide-21
SLIDE 21

DEMO

slide-22
SLIDE 22

22

FUTURE DIRECTIONS

  • CTDB “fcntl lock” dependency removal.

etcd

  • Battle tested.
  • Push other confjg info into etcd?

– nodes – public_addresses

  • I've already started on this.

– Expect more info at SDC! –

Zookeeper much the same as etcd.

  • Not working on it now.
  • S3 style object stores.
slide-23
SLIDE 23

23

FUTURE DIRECTIONS

  • RGW

Export object data as fjles.

Export fjles as object data?

  • Not today in ceph.

Integrate where?

  • S3
  • RADOS
  • RBD

With SMB Direct, who knows?

slide-24
SLIDE 24

QUESTIONS?

slide-25
SLIDE 25

THANK YOU!

Ira Cooper

SAMBA TEAM

ira@wakeful.net