SLIDE 1 Sylvia Rat nasamy, Paul Francis, Mark Handley, Richard Karp, Scot t Shenker
A Scalable, Cont ent - Addressable Net work
ACI RI U.C.Ber keley Tahoe Net works 1 2 3
1,2 3 1 1,2 1
SLIDE 2 Out line
- I nt roduct ion
- Design
- Evaluat ion
- St r engt hs & Weaknesses
- Ongoing Work
SLIDE 3 I nt ernet -scale hash t ables
– essent ial building block in sof t ware syst ems
- I nt ernet -scale dist ribut ed hash t ables
– equally valuable t o large-scale dist ribut ed syst ems?
SLIDE 4
– essent ial building block in sof t ware syst ems
- I nt ernet -scale dist ribut ed hash t ables
– equally valuable t o large-scale dist ribut ed syst ems?
– Napst er, Gnut ella, Groove, FreeNet , Moj oNat ion…
- large-scale st orage management syst ems
– Publius, OceanSt ore, PAST, Farsit e, CFS ...
I nt ernet -scale hash t ables
SLIDE 5 Cont ent -Addressable Net work
(CAN)
- CAN: I nt ernet -scale hash t able
- I nt erf ace
– insert (key,value) – value = ret rieve(key)
SLIDE 6 Cont ent -Addressable Net work
(CAN)
- CAN: I nt ernet -scale hash t able
- I nt erf ace
– insert (key,value) – value = ret rieve(key)
– scalable – operat ionally simple – good perf ormance (w/ improvement )
SLIDE 7 Cont ent -Addressable Net work
(CAN)
- CAN: I nt ernet -scale hash t able
- I nt erf ace
– insert (key,value) – value = ret rieve(key)
– scalable – operat ionally simple – good perf ormance
- Relat ed syst ems: Chord/ P
ast ry/ Tapest ry/ Buzz/ Plaxt on ...
SLIDE 8 Problem Scope
Design a syst em t hat provides t he int erf ace
- scalabilit y
- robust ness
- perf ormance
- securit y
- Applicat ion-specif ic, higher level primit ives
- keyword searching
- mut able cont ent
- anonymit y
SLIDE 9 Out line
- I nt roduct ion
- Design
- Evaluat ion
- St r engt hs & Weaknesses
- Ongoing Work
SLIDE 10 K V
CAN: basic idea
K V K V K V K V K V K V K V K V K V K V
SLIDE 11 CAN: basic idea
insert (K1,V1)
K V K V K V K V K V K V K V K V K V K V K V
SLIDE 12 CAN: basic idea
insert (K1,V1)
K V K V K V K V K V K V K V K V K V K V K V
SLIDE 13 CAN: basic idea
(K1,V1)
K V K V K V K V K V K V K V K V K V K V K V
SLIDE 14 CAN: basic idea
ret rieve (K1)
K V K V K V K V K V K V K V K V K V K V K V
SLIDE 15 CAN: solut ion
- virt ual Cart esian coordinat e space
- ent ire space is part it ioned amongst all t he nodes
– every node “ owns” a zone in t he overall space
– can st ore dat a at “ point s” in t he space – can rout e f rom one “ point ” t o anot her
- point = node t hat owns t he enclosing zone
SLIDE 16
CAN: simple example
1
SLIDE 17
CAN: simple example
1 2
SLIDE 18
CAN: simple example
1 2 3
SLIDE 19
CAN: simple example
1 2 3 4
SLIDE 20
CAN: simple example
SLIDE 21
CAN: simple example
I
SLIDE 22
CAN: simple example
node I ::insert (K,V)
I
SLIDE 23
(1) a = hx(K)
CAN: simple example
x = a node I ::insert (K,V)
I
SLIDE 24
(1) a = hx(K) b = hy(K)
CAN: simple example
x = a y = b node I ::insert (K,V)
I
SLIDE 25
(1) a = hx(K) b = hy(K)
CAN: simple example
(2) rout e(K,V) -> (a,b)
node I ::insert (K,V)
I
SLIDE 26
CAN: simple example
(2) rout e(K,V) -> (a,b) (3) (a,b) st ores (K,V)
(K,V) node I ::insert (K,V)
I
(1) a = hx(K) b = hy(K)
SLIDE 27
CAN: simple example
(2) rout e “ ret rieve(K)” t o (a,b)
(K,V) (1) a = hx(K) b = hy(K) node J ::ret rieve(K)
J
SLIDE 28
Dat a st or ed in t he CAN is addr essed by name (i.e. key), not locat ion (i.e. I P address)
CAN
SLIDE 29
CAN: rout ing t able
SLIDE 30 CAN: rout ing
(a,b) (x,y)
SLIDE 31
A node only maint ains st at e f or it s immediat e neighbor ing nodes
CAN: rout ing
SLIDE 32 CAN: node insert ion
Boot st r ap node 1) Discover some node “ I ” already in CAN
new node
SLIDE 33
CAN: node insert ion
I new node
1) discover some node “ I ” already in CAN
SLIDE 34 CAN: node insert ion
2) pick r andom point in space
I
(p,q)
new node
SLIDE 35 CAN: node insert ion
(p,q)
3) I rout es t o (p,q), discovers node J
I J new node
SLIDE 36
CAN: node insert ion
new J 4) split J ’s zone in half … new owns one half
SLIDE 37
I nsert ing a new node af f ect s only a single ot her node and it s immediat e neighbors
CAN: node insert ion
SLIDE 38 CAN: node f ailures
- Need t o repair t he space
– recover dat abase (weak point )
- sof t -st at e updat es
- use r eplicat ion, r ebuild dat abase f r om r eplicas
– repair rout ing
SLIDE 39 CAN: t akeover algorit hm
– know your neighbor’s neighbors – when a node f ails, one of it s neighbors t akes over it s zone
- More complex f ailure modes
– simult aneous f ailure of mult iple adj acent nodes – scoped f looding t o discover neighbors – hopef ully, a rare event
SLIDE 40
Only t he f ailed node’s immediat e neighbors are required f or recovery
CAN: node f ailures
SLIDE 41 Design recap
– complet ely dist ribut ed – self -organizing – nodes only maint ain st at e f or t heir immediat e neighbors
- Addit ional design f eat ures
– mult iple, independent spaces (realit ies) – background load balancing algorit hm – simple heurist ics t o improve perf ormance
SLIDE 42 Out line
- I nt roduct ion
- Design
- Evaluat ion
- St r engt hs & Weaknesses
- Ongoing Work
SLIDE 43 Evaluat ion
- Scalabilit y
- Low-lat ency
- Load balancing
- Robust ness
SLIDE 44 CAN: scalabilit y
- For a unif ormly part it ioned space wit h n nodes and d
dimensions
– per node, number of neighbors is 2d – aver age r out ing pat h is (dn1/ d)/ 4 hops – simulat ions show t hat t he above result s hold in pract ice
- Can scale t he net work wit hout increasing per-node
st at e
- Chord/ Plaxt on/ Tapest ry/ Buzz
– log(n) nbrs wit h log(n) hops
SLIDE 45 CAN: low-lat ency
– lat ency st ret ch = (CAN rout ing delay) (I P rout ing delay) – applicat ion-level rout ing may lead t o high st ret ch
– increase dimensions, realit ies (reduce t he pat h lengt h) – Heurist ics (reduce t he per-CAN-hop lat ency)
- RTT-weight ed r out ing
- mult iple nodes per zone (peer nodes)
- det er minist ically r eplicat e ent r ies
SLIDE 46 CAN: low-lat ency
# nodes Lat ency st ret ch
20 40 60 80 100 120 140 160 180
16K 32K 65K 131K
# dimensions = 2
w/ o heurist ics w/ heurist ics
SLIDE 47 2 4 6 8 10
CAN: low-lat ency
# nodes Lat ency st ret ch
16K 32K 65K 131K
# dimensions = 10
w/ o heurist ics w/ heurist ics
SLIDE 48 CAN: load balancing
– Dealing wit h hot -spot s
- popular (key,value) pair s
- nodes cache r ecent ly r equest ed ent r ies
- over loaded node r eplicat es popular ent r ies at neighbor s
– Unif orm coordinat e space part it ioning
- unif or mly spr ead (key,value) ent r ies
- unif or mly spr ead out r out ing load
SLIDE 49 Unif orm Part it ioning
– at j oin t ime, pick a zone – check neighboring zones – pick t he largest zone and split t hat one
SLIDE 50 20 40 60 80 100
Unif orm Part it ioning
V 2V 4V 8V
Volume Per cent age
w/ o check w/ check V = t ot al volume n
V 16 V 8 V 4 V 2
65,000 nodes, 3 dimensions
SLIDE 51 CAN: Robust ness
- Complet ely dist ribut ed
– no single point of f ailur e ( not applicable t o pieces of dat abase when node f ailur e happens)
- Not exploring dat abase recovery (in case
t here are mult iple copies of dat abase)
– can rout e around t rouble
SLIDE 52 Out line
- I nt roduct ion
- Design
- Evaluat ion
- St r engt hs & Weaknesses
- Ongoing Work
SLIDE 53 St rengt hs
- More resilient t han f looding
br oadcast net wor ks
- Ef f icient at locat ing inf ormat ion
- Fault t olerant rout ing
- Node & Dat a High Availabilit y (w/
improvement )
- Manageable r out ing t able size &
net work t raf f ic
SLIDE 54 Weaknesses
- I mpossible t o perf orm a f uzzy search
- Suscept ible t o malicious act ivit y
- Maint ain coherence of all t he indexed
dat a (Net work overhead, Ef f icient dist r ibut ion)
- St ill relat ively higher rout ing lat ency
- Poor per f or mance w/ o impr ovement
SLIDE 55 Suggest ions
- Cat alog and Met a indexes t o perf orm
search f unct ion
- Ext ension t o handle mut able cont ent
ef f icient ly f or web-host ing
- Securit y mechanism t o def ense
against at t acks
SLIDE 56 Out line
- I nt roduct ion
- Design
- Evaluat ion
- St r engt hs & Weaknesses
- Ongoing Work
SLIDE 57 Ongoing Work
- Topologically-sensit ive CAN const ruct ion
– dist ribut ed binning
SLIDE 58 Dist ribut ed Binning
– bin nodes such t hat co-locat ed nodes land in same bin
– well known set of landmar k machines – each CAN node, measur es it s RTT t o each landmar k –
- r der s t he landmar ks in or der of incr easing RTT
- CAN const ruct ion
– place nodes f r om t he same bin close t oget her on t he CAN
SLIDE 59 Dist ribut ed Binning
– 4 Landmarks (placed at 5 hops away f rom each ot her) – naïve part it ioning
number of nodes
256 1K 4K
l a t e n c y S t r e t c h
5 10 15 20 256 1K 4K
w/ binning w/ o binning w/ binning
# dimensions=2 # dimensions=4
SLIDE 60 Ongoing Work (cont ’d)
- Topologically-sensit ive CAN const ruct ion
– dist ribut ed binning
- CAN Securit y (Pet ros Maniat is - St anf ord)
– spect rum of at t acks – appropriat e count er-measures
SLIDE 61 Ongoing Work (cont ’d)
– Applicat ion-level Mult icast (NGC 2001) – Grass-Root s Cont ent Dist ribut ion – Dist ribut ed Dat abases using CANs
(J .Heller st ein, S.Rat nasamy, S.Shenker , I .St oica, S.Zhuang)
SLIDE 62 Summary
– an I nt ernet -scale hash t able – pot ent ial building block in I nt ernet applicat ions
– O(d) per-node st at e
– simple heurist ics help a lot
– decent ralized, can rout e around t rouble