Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am - - PowerPoint PPT Presentation

scaling your storage using ceph
SMART_READER_LITE
LIVE PREVIEW

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am - - PowerPoint PPT Presentation

Scaling Your Storage Using Ceph Wido den Hollander #CCCEU Who am I? Wido den Hollander (1986) CTO at PCextreme B.V. Dutch Hosting provider Ceph trainer and consultant at 42on B.V. Part of the Ceph community since late 2010


slide-1
SLIDE 1

Scaling Your Storage Using Ceph

Wido den Hollander

#CCCEU

slide-2
SLIDE 2

Who am I?

  • Wido den Hollander (1986)

– CTO at PCextreme B.V.

  • Dutch Hosting provider

– Ceph trainer and consultant at 42on B.V. – Part of the Ceph community since late 2010

  • Wrote PHP and Java bindings
  • Wrote the CloudStack integration

– Including libvirt storage pool support

09 Oct 2015 2

#CCCEU / @widodh

slide-3
SLIDE 3

CloudStack primary storage

#CCCEU

slide-4
SLIDE 4

CloudStack primary storage

  • A set of hypervisors with storage

– NFS, iSCSI or FiberChannel – Usually one NAS or SAN per cluster

  • Local Network for low latency and

high bandwidth

09 Oct 2015 4

#CCCEU / @widodh

slide-5
SLIDE 5

CloudStack primary storage

09 Oct 2015 5

#CCCEU / @widodh

slide-6
SLIDE 6

CloudStack primary storage

  • Scaling is a problem however:

– Number of disks – Network connections/bandwidth – CPU power – Protocols

  • NFS and iSCSI do not scale

09 Oct 2015 6

#CCCEU / @widodh

slide-7
SLIDE 7

Scaling NFS or iSCSI

  • NFS and iSCSI expect the server to

always be available

– Vendors implement all kinds of tricks

  • Virtual IPs
  • ARP spoofjng

– This is a fundamental problem when it comes

to large scale

09 Oct 2015 7

#CCCEU / @widodh

slide-8
SLIDE 8

Black boxes

  • Black boxes:

– EMC, EqualLogic, NetApp, they provide you a

black box

– Vendor lock-in – End-of-Life determined by vendor

09 Oct 2015 8

#CCCEU / @widodh

slide-9
SLIDE 9

Ceph

#CCCEU

slide-10
SLIDE 10

What is Ceph?

  • Ceph is a distributed object store

and fjle system designed to provide:

– excellent performance – reliability – scalability

09 Oct 2015 10

#CCCEU / @widodh

slide-11
SLIDE 11

Design principles

  • Data is replicated in the Ceph cluster

– User specifjed, 2x, 3x (recommended), 4x, etc

  • Hardware failure is the rule

– Not the exception!

  • Software defjned storage

– Fully hardware agnostic

  • Consistency goes over availability

09 Oct 2015 11

#CCCEU / @widodh

slide-12
SLIDE 12

What is Ceph?

09 Oct 2015 12

#CCCEU / @widodh

slide-13
SLIDE 13

How does it work?

  • Clients are aware of cluster status

– Client calculates where objects are – The connect directly to the nodes using TCP

  • Ceph nodes are intelligent

– They take care of replication and recovery

  • Block Devices are striped over 4MB
  • bjects

– These objects are replicated by the nodes

09 Oct 2015 13

#CCCEU / @widodh

slide-14
SLIDE 14

How does it perform?

  • Ceph performs great with parallel I/O

– Cloud workloads are parallel – Do not expect 10k IOps for a single disk

  • Each node adds I/O, RAM and CPU

– Thus adds performance

  • Latency is mainly infmuenced by the

network

– The lower the latency, the better the performance

09 Oct 2015 14

#CCCEU / @widodh

slide-15
SLIDE 15

How does it perform?

  • Network latency is key

– Difgerence between 10GbE and 1GbE is big

  • 8k packet round trip:

– 1GbE: 0.8ms – 10GbE: 0.2ms

09 Oct 2015 15

#CCCEU / @widodh

slide-16
SLIDE 16

How does it perform?

09 Oct 2015 16

#CCCEU / @widodh

slide-17
SLIDE 17

Failure is the rule!

  • Ceph is designed for failure!
  • T

ake out any machine you want

– Kernel upgrades – Firmware upgrades – Replacement of hardware

  • 1000 day uptimes are no longer cool

– Upgrade and reboot those machines!

09 Oct 2015 17

#CCCEU / @widodh

slide-18
SLIDE 18

Failure is the rule!

09 Oct 2015 18

#CCCEU / @widodh

slide-19
SLIDE 19

Failure is the rule!

09 Oct 2015 19

#CCCEU / @widodh

slide-20
SLIDE 20

Failure is the rule!

09 Oct 2015 20

#CCCEU / @widodh

slide-21
SLIDE 21

Scaling Ceph

  • Ceph is designed to scale

– Start with 10TB and scale to 10PB

  • No downtime or manual data

migration is required

– Never watch to rsync or scp

  • Migration is proportional to the

change

– Expand by 10% and about 10% of migrates

09 Oct 2015 21

#CCCEU / @widodh

slide-22
SLIDE 22

Scaling Ceph

  • During expansion data migrates

automatically to new nodes

  • New nodes add additional

performance

  • Mix difgerent types of hardware

– 2TB and 4TB drives for example

09 Oct 2015 22

#CCCEU / @widodh

slide-23
SLIDE 23

Designing a Ceph cluster

  • Use small(er) nodes

– 2U machines with 8 drives

  • More nodes means less impact when

a node fails

– 'Failure' could be maintenance!

  • Start with at least 10 nodes

09 Oct 2015 23

#CCCEU / @widodh

slide-24
SLIDE 24

Designing a Ceph cluster

09 Oct 2015 24

#CCCEU / @widodh

slide-25
SLIDE 25

Ceph as Primary Storage

#CCCEU

slide-26
SLIDE 26

Ceph as Primary Storage

  • Ceph block devices can be used as

primary storage

  • KVM is currently the only supported

hypervisor

– Ubuntu works best – CentOS works with patched libvirt

  • All operations are supported

– T

emplates, snapshots, resizing

09 Oct 2015 26

#CCCEU / @widodh

slide-27
SLIDE 27

Future plans

  • Incremental backups to other Ceph

cluster

– Is on Ceph's roadmap

  • Snapshots without copy to

Secondary Storage

  • Xen support

09 Oct 2015 27

#CCCEU / @widodh

slide-28
SLIDE 28

Ceph at PCextreme

#CCCEU

slide-29
SLIDE 29

Ceph at PCextreme

  • We use Ceph as Primary Storage

behind CloudStack

– KVM hypervisor on Ubuntu

  • We have two Service Ofgerings:

– Agile: Local Storage on SSD – Stamina: Ceph RBD storage

  • Only available in Amsterdam

09 Oct 2015 29

#CCCEU / @widodh

slide-30
SLIDE 30

Ceph at PCextreme

  • 500TB of storage

– 39 hosts – 3 racks

  • Replicas spread out over racks

– 258 disks

  • 96 SSDs
  • 162 HDDs

09 Oct 2015 30

#CCCEU / @widodh

slide-31
SLIDE 31

Ceph at PCextreme

  • 20.000 IOps on average
  • SuperMicro hardware

– Mix of Samsung, Intel, Seagate and Western Digital

SSDs/HDDs

  • Running on IPv6-only

– There is NO IPv4 in the Ceph cluster – Public routed IPv6 (with a fjrewall)

09 Oct 2015 31

#CCCEU / @widodh

slide-32
SLIDE 32

Ceph at PCextreme

09 Oct 2015 32

#CCCEU / @widodh

slide-33
SLIDE 33

Ceph at PCextreme

09 Oct 2015 33

#CCCEU / @widodh

slide-34
SLIDE 34

Ceph at PCextreme

09 Oct 2015 34

#CCCEU / @widodh

slide-35
SLIDE 35

Ceph at PCextreme

HEALTH_WARN

09 Oct 2015 35

#CCCEU / @widodh

slide-36
SLIDE 36

Ceph at PCextreme

  • We are updating the whole Ceph

cluster

– Using bcache – Updating to Ubuntu 14.04 – Updating Ceph

09 Oct 2015 36

#CCCEU / @widodh

slide-37
SLIDE 37

Ceph at PCextreme

09 Oct 2015 37

#CCCEU / @widodh

During offjce hours :-)

slide-38
SLIDE 38

Ceph at PCextreme

  • We don't do Ceph maintaince at

night anymore

  • That's how Ceph should be used

09 Oct 2015 38

#CCCEU / @widodh

slide-39
SLIDE 39

Thanks!

#CCCEU

Find me @widodh on Skype and T witter wido@42on.com