D i s t r i b u t e d S t o r a g e S y s t e - - PowerPoint PPT Presentation

d i s t r i b u t e d s t o r a g e s y s t e m s
SMART_READER_LITE
LIVE PREVIEW

D i s t r i b u t e d S t o r a g e S y s t e - - PowerPoint PPT Presentation

D i s t r i b u t e d S t o r a g e S y s t e m s John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com O u r r e q u i r e m e n t s Bright box has multiple zones (data


slide-1
SLIDE 1

John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com

D i s t r i b u t e d S t

  • r

a g e S y s t e m s

slide-2
SLIDE 2

O u r r e q u i r e m e n t s

  • Bright box has multiple zones (data centres)
  • Should tolerate a zone failure
  • Scale smoothly as data size grows
  • Should use exciting unproven technology
  • Libre software license
slide-3
SLIDE 3

B r i e f h i s t

  • r

y

  • f

f i l e a c c e s s

slide-4
SLIDE 4

S c a l i n g N F S : O n e d i s k

Filesystem Server Clients disk

slide-5
SLIDE 5

S c a l i n g N F S : R A I D

Filesystem Server Clients Redundant Array of Inexpensive Disks

slide-6
SLIDE 6

S c a l i n g N F S : S A N

Filesystem Server Redundant Array of Inexpensive Disks in a NRSES (not redundant singular expensive SAN) Clients

slide-7
SLIDE 7

S c a l i n g N F S : S h a r e d d i s k f s

Redundant Array of Inexpensive Disks in a NRSES (not redundant singular expensive SAN) Clients

Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server

GFS or OCFS or ...

slide-8
SLIDE 8

S h a r e d d i s k f s : R e p l i c a t i

  • n

Redundant Array of Inexpensive Disks in a not redundant singular expensive SAN

Clients

Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server

Redundant Array of Inexpensive Disks in a not redundant singular expensive SAN

GFS or OCFS or ... Clustered LVM with mirroring

slide-9
SLIDE 9

S h a r e d d i s k f s : R e p l i c a t i

  • n

Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN Clients

Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server

GFS or OCFS or ...

slide-10
SLIDE 10

S h a r e d d i s k f s : R e p l i c a t i

  • n

Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN

Clients

Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server Filesystem Server

GFS or OCFS or ... Clustered LVM with mirroring

Redundant Array of Inexpensive Disks in a not redundant singular more expensive SAN

slide-11
SLIDE 11

O l d t e c h n i q u e s

  • Hot or warm standby servers
  • Expensive SAN hardware
  • Shared block devices
  • Moving IP addresses
  • Server side replication
  • Scales mostly vertically
  • Manual partitioning to scale horizontally
slide-12
SLIDE 12

N e w t e c h n i q u e s

  • Shared nothing
  • Clever clients
  • Automatic partitioning
  • Automatic replication
  • Clever stuff: DHT, Vector clocks, PAXOS,

Mapreduce, Merkle trees, Unicorn hooves

  • POSIX
slide-13
SLIDE 13

N e w P r

  • b

l e m s

  • Locating your data
  • Ensuring consistency
  • Something has to give
slide-14
SLIDE 14

B r e w e r s C A P t h e

  • r

e m

  • Consistency
  • Availability
  • Partition tolerance
slide-15
SLIDE 15

G l u s t e r F S

Storage Cluster Clients

slide-16
SLIDE 16

H a d

  • p

F i l e S y s t e m

Name node Storage Cluster Clients

slide-17
SLIDE 17

H a d

  • p

F i l e S y s t e m

  • Hot failover patches in Feb
  • Batch processing, not interactive
  • High throughput, not low latency
  • Map Reduce
  • Namenode SPOF
  • Multi-data centre
  • Consistent
slide-18
SLIDE 18

M

  • n

g

  • D

B

  • Document store, dynamic schema
  • Async replication
  • Primary server for writes
  • Automatic sharding
  • Map Reduce
  • GridFS for large files
  • Multi-datacentre, but not partition tolerant
  • Mostly consistent
slide-19
SLIDE 19

M

  • n

g

  • D

B

slide-20
SLIDE 20

O p e n s t a c k S w i f t

proxies Storage Cluster Clients

slide-21
SLIDE 21

O p e n s t a c k S w i f t

slide-22
SLIDE 22

C a s s a n d r a

  • P2P, DHT, Gossip, Hinted Handoff
  • Column orientated. Data ordered.
  • Design schema for types of queries
  • Very fast highly available writing
  • Per request consistency. Multi-data centre
  • Thrift API
slide-23
SLIDE 23

R i a k

  • Key value store.
  • DHT, Gossip, Vector Clocks
  • Map reduce
  • Luwak for large files
slide-24
SLIDE 24

Z

  • k

e e p e r

  • PAXOS like consensus protocol
  • Read scales up with more servers
  • Writes slow down with more servers
  • Always consistent
  • In-memory
  • Strict ordering
  • Small data
slide-25
SLIDE 25

C e p h

  • Object store
  • Full POSIX file system on top
  • PAXOS for cluster state
  • CRUSH rather than DHT
  • Multi-datacenter.
  • Strongly consistent, not partition tolerant
  • RBD, S3-alike, plus POSIX
slide-26
SLIDE 26

C e p h

Monitor Cluster Metadata Cluster Storage Cluster Clients

slide-27
SLIDE 27

John Leach john@brightbox.com twitter @johnleach Brightbox Cloud http://brightbox.com

D i s t r i b u t e d S t

  • r

a g e S y s t e m s