A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat - - PowerPoint PPT Presentation

a scalable cont ent addressable net work
SMART_READER_LITE
LIVE PREVIEW

A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat - - PowerPoint PPT Presentation

A Scalable, Cont ent - Addressable Net work 1,2 3 1 Sylvia Rat nasamy, Paul Francis, Mark Handley, 1,2 1 Richard Karp, Scot t Shenker 2 3 1 Tahoe U.C.Ber keley ACI RI Net works Out line I nt roduct ion Design


slide-1
SLIDE 1

Sylvia Rat nasamy, Paul Francis, Mark Handley, Richard Karp, Scot t Shenker

A Scalable, Cont ent - Addressable Net work

ACI RI U.C.Ber keley Tahoe Net works 1 2 3

1,2 3 1 1,2 1

slide-2
SLIDE 2

Out line

  • I nt roduct ion
  • Design
  • Evaluat ion
  • St r engt hs & Weaknesses
  • Ongoing Work
slide-3
SLIDE 3

I nt ernet -scale hash t ables

  • Hash t ables

– essent ial building block in sof t ware syst ems

  • I nt ernet -scale dist ribut ed hash t ables

– equally valuable t o large-scale dist ribut ed syst ems?

slide-4
SLIDE 4
  • Hash t ables

– essent ial building block in sof t ware syst ems

  • I nt ernet -scale dist ribut ed hash t ables

– equally valuable t o large-scale dist ribut ed syst ems?

  • peer -t o-peer syst ems

– Napst er, Gnut ella, Groove, FreeNet , Moj oNat ion…

  • large-scale st orage management syst ems

– Publius, OceanSt ore, PAST, Farsit e, CFS ...

  • mirroring on t he Web

I nt ernet -scale hash t ables

slide-5
SLIDE 5

Cont ent -Addressable Net work

(CAN)

  • CAN: I nt ernet -scale hash t able
  • I nt erf ace

– insert (key,value) – value = ret rieve(key)

slide-6
SLIDE 6

Cont ent -Addressable Net work

(CAN)

  • CAN: I nt ernet -scale hash t able
  • I nt erf ace

– insert (key,value) – value = ret rieve(key)

  • Propert ies

– scalable – operat ionally simple – good perf ormance (w/ improvement )

slide-7
SLIDE 7

Cont ent -Addressable Net work

(CAN)

  • CAN: I nt ernet -scale hash t able
  • I nt erf ace

– insert (key,value) – value = ret rieve(key)

  • Propert ies

– scalable – operat ionally simple – good perf ormance

  • Relat ed syst ems: Chord/ P

ast ry/ Tapest ry/ Buzz/ Plaxt on ...

slide-8
SLIDE 8

Problem Scope

Design a syst em t hat provides t he int erf ace

  • scalabilit y
  • robust ness
  • perf ormance
  • securit y
  • Applicat ion-specif ic, higher level primit ives
  • keyword searching
  • mut able cont ent
  • anonymit y
slide-9
SLIDE 9

Out line

  • I nt roduct ion
  • Design
  • Evaluat ion
  • St r engt hs & Weaknesses
  • Ongoing Work
slide-10
SLIDE 10

K V

CAN: basic idea

K V K V K V K V K V K V K V K V K V K V

slide-11
SLIDE 11

CAN: basic idea

insert (K1,V1)

K V K V K V K V K V K V K V K V K V K V K V

slide-12
SLIDE 12

CAN: basic idea

insert (K1,V1)

K V K V K V K V K V K V K V K V K V K V K V

slide-13
SLIDE 13

CAN: basic idea

(K1,V1)

K V K V K V K V K V K V K V K V K V K V K V

slide-14
SLIDE 14

CAN: basic idea

ret rieve (K1)

K V K V K V K V K V K V K V K V K V K V K V

slide-15
SLIDE 15

CAN: solut ion

  • virt ual Cart esian coordinat e space
  • ent ire space is part it ioned amongst all t he nodes

– every node “ owns” a zone in t he overall space

  • abst ract ion

– can st ore dat a at “ point s” in t he space – can rout e f rom one “ point ” t o anot her

  • point = node t hat owns t he enclosing zone
slide-16
SLIDE 16

CAN: simple example

1

slide-17
SLIDE 17

CAN: simple example

1 2

slide-18
SLIDE 18

CAN: simple example

1 2 3

slide-19
SLIDE 19

CAN: simple example

1 2 3 4

slide-20
SLIDE 20

CAN: simple example

slide-21
SLIDE 21

CAN: simple example

I

slide-22
SLIDE 22

CAN: simple example

node I ::insert (K,V)

I

slide-23
SLIDE 23

(1) a = hx(K)

CAN: simple example

x = a node I ::insert (K,V)

I

slide-24
SLIDE 24

(1) a = hx(K) b = hy(K)

CAN: simple example

x = a y = b node I ::insert (K,V)

I

slide-25
SLIDE 25

(1) a = hx(K) b = hy(K)

CAN: simple example

(2) rout e(K,V) -> (a,b)

node I ::insert (K,V)

I

slide-26
SLIDE 26

CAN: simple example

(2) rout e(K,V) -> (a,b) (3) (a,b) st ores (K,V)

(K,V) node I ::insert (K,V)

I

(1) a = hx(K) b = hy(K)

slide-27
SLIDE 27

CAN: simple example

(2) rout e “ ret rieve(K)” t o (a,b)

(K,V) (1) a = hx(K) b = hy(K) node J ::ret rieve(K)

J

slide-28
SLIDE 28

Dat a st or ed in t he CAN is addr essed by name (i.e. key), not locat ion (i.e. I P address)

CAN

slide-29
SLIDE 29

CAN: rout ing t able

slide-30
SLIDE 30

CAN: rout ing

(a,b) (x,y)

slide-31
SLIDE 31

A node only maint ains st at e f or it s immediat e neighbor ing nodes

CAN: rout ing

slide-32
SLIDE 32

CAN: node insert ion

Boot st r ap node 1) Discover some node “ I ” already in CAN

new node

slide-33
SLIDE 33

CAN: node insert ion

I new node

1) discover some node “ I ” already in CAN

slide-34
SLIDE 34

CAN: node insert ion

2) pick r andom point in space

I

(p,q)

new node

slide-35
SLIDE 35

CAN: node insert ion

(p,q)

3) I rout es t o (p,q), discovers node J

I J new node

slide-36
SLIDE 36

CAN: node insert ion

new J 4) split J ’s zone in half … new owns one half

slide-37
SLIDE 37

I nsert ing a new node af f ect s only a single ot her node and it s immediat e neighbors

CAN: node insert ion

slide-38
SLIDE 38

CAN: node f ailures

  • Need t o repair t he space

– recover dat abase (weak point )

  • sof t -st at e updat es
  • use r eplicat ion, r ebuild dat abase f r om r eplicas

– repair rout ing

  • t akeover algor it hm
slide-39
SLIDE 39

CAN: t akeover algorit hm

  • Simple f ailures

– know your neighbor’s neighbors – when a node f ails, one of it s neighbors t akes over it s zone

  • More complex f ailure modes

– simult aneous f ailure of mult iple adj acent nodes – scoped f looding t o discover neighbors – hopef ully, a rare event

slide-40
SLIDE 40

Only t he f ailed node’s immediat e neighbors are required f or recovery

CAN: node f ailures

slide-41
SLIDE 41

Design recap

  • Basic CAN

– complet ely dist ribut ed – self -organizing – nodes only maint ain st at e f or t heir immediat e neighbors

  • Addit ional design f eat ures

– mult iple, independent spaces (realit ies) – background load balancing algorit hm – simple heurist ics t o improve perf ormance

slide-42
SLIDE 42

Out line

  • I nt roduct ion
  • Design
  • Evaluat ion
  • St r engt hs & Weaknesses
  • Ongoing Work
slide-43
SLIDE 43

Evaluat ion

  • Scalabilit y
  • Low-lat ency
  • Load balancing
  • Robust ness
slide-44
SLIDE 44

CAN: scalabilit y

  • For a unif ormly part it ioned space wit h n nodes and d

dimensions

– per node, number of neighbors is 2d – aver age r out ing pat h is (dn1/ d)/ 4 hops – simulat ions show t hat t he above result s hold in pract ice

  • Can scale t he net work wit hout increasing per-node

st at e

  • Chord/ Plaxt on/ Tapest ry/ Buzz

– log(n) nbrs wit h log(n) hops

slide-45
SLIDE 45

CAN: low-lat ency

  • Problem

– lat ency st ret ch = (CAN rout ing delay) (I P rout ing delay) – applicat ion-level rout ing may lead t o high st ret ch

  • Solut ion

– increase dimensions, realit ies (reduce t he pat h lengt h) – Heurist ics (reduce t he per-CAN-hop lat ency)

  • RTT-weight ed r out ing
  • mult iple nodes per zone (peer nodes)
  • det er minist ically r eplicat e ent r ies
slide-46
SLIDE 46

CAN: low-lat ency

# nodes Lat ency st ret ch

20 40 60 80 100 120 140 160 180

16K 32K 65K 131K

# dimensions = 2

w/ o heurist ics w/ heurist ics

slide-47
SLIDE 47

2 4 6 8 10

CAN: low-lat ency

# nodes Lat ency st ret ch

16K 32K 65K 131K

# dimensions = 10

w/ o heurist ics w/ heurist ics

slide-48
SLIDE 48

CAN: load balancing

  • Two pieces

– Dealing wit h hot -spot s

  • popular (key,value) pair s
  • nodes cache r ecent ly r equest ed ent r ies
  • over loaded node r eplicat es popular ent r ies at neighbor s

– Unif orm coordinat e space part it ioning

  • unif or mly spr ead (key,value) ent r ies
  • unif or mly spr ead out r out ing load
slide-49
SLIDE 49

Unif orm Part it ioning

  • Added check

– at j oin t ime, pick a zone – check neighboring zones – pick t he largest zone and split t hat one

slide-50
SLIDE 50

20 40 60 80 100

Unif orm Part it ioning

V 2V 4V 8V

Volume Per cent age

  • f nodes

w/ o check w/ check V = t ot al volume n

V 16 V 8 V 4 V 2

65,000 nodes, 3 dimensions

slide-51
SLIDE 51

CAN: Robust ness

  • Complet ely dist ribut ed

– no single point of f ailur e ( not applicable t o pieces of dat abase when node f ailur e happens)

  • Not exploring dat abase recovery (in case

t here are mult iple copies of dat abase)

  • Resilience of rout ing

– can rout e around t rouble

slide-52
SLIDE 52

Out line

  • I nt roduct ion
  • Design
  • Evaluat ion
  • St r engt hs & Weaknesses
  • Ongoing Work
slide-53
SLIDE 53

St rengt hs

  • More resilient t han f looding

br oadcast net wor ks

  • Ef f icient at locat ing inf ormat ion
  • Fault t olerant rout ing
  • Node & Dat a High Availabilit y (w/

improvement )

  • Manageable r out ing t able size &

net work t raf f ic

slide-54
SLIDE 54

Weaknesses

  • I mpossible t o perf orm a f uzzy search
  • Suscept ible t o malicious act ivit y
  • Maint ain coherence of all t he indexed

dat a (Net work overhead, Ef f icient dist r ibut ion)

  • St ill relat ively higher rout ing lat ency
  • Poor per f or mance w/ o impr ovement
slide-55
SLIDE 55

Suggest ions

  • Cat alog and Met a indexes t o perf orm

search f unct ion

  • Ext ension t o handle mut able cont ent

ef f icient ly f or web-host ing

  • Securit y mechanism t o def ense

against at t acks

slide-56
SLIDE 56

Out line

  • I nt roduct ion
  • Design
  • Evaluat ion
  • St r engt hs & Weaknesses
  • Ongoing Work
slide-57
SLIDE 57

Ongoing Work

  • Topologically-sensit ive CAN const ruct ion

– dist ribut ed binning

slide-58
SLIDE 58

Dist ribut ed Binning

  • Goal

– bin nodes such t hat co-locat ed nodes land in same bin

  • I dea

– well known set of landmar k machines – each CAN node, measur es it s RTT t o each landmar k –

  • r der s t he landmar ks in or der of incr easing RTT
  • CAN const ruct ion

– place nodes f r om t he same bin close t oget her on t he CAN

slide-59
SLIDE 59

Dist ribut ed Binning

– 4 Landmarks (placed at 5 hops away f rom each ot her) – naïve part it ioning

number of nodes

256 1K 4K

l a t e n c y S t r e t c h

5 10 15 20 256 1K 4K

  • w/ o binning

w/ binning w/ o binning w/ binning

# dimensions=2 # dimensions=4

slide-60
SLIDE 60

Ongoing Work (cont ’d)

  • Topologically-sensit ive CAN const ruct ion

– dist ribut ed binning

  • CAN Securit y (Pet ros Maniat is - St anf ord)

– spect rum of at t acks – appropriat e count er-measures

slide-61
SLIDE 61

Ongoing Work (cont ’d)

  • CAN Usage

– Applicat ion-level Mult icast (NGC 2001) – Grass-Root s Cont ent Dist ribut ion – Dist ribut ed Dat abases using CANs

(J .Heller st ein, S.Rat nasamy, S.Shenker , I .St oica, S.Zhuang)

slide-62
SLIDE 62

Summary

  • CAN

– an I nt ernet -scale hash t able – pot ent ial building block in I nt ernet applicat ions

  • Scalabilit y

– O(d) per-node st at e

  • Low-lat ency rout ing

– simple heurist ics help a lot

  • Robust

– decent ralized, can rout e around t rouble