Dynamic Replication and Partitioning Costin Raiciu University - - PowerPoint PPT Presentation

dynamic replication and partitioning
SMART_READER_LITE
LIVE PREVIEW

Dynamic Replication and Partitioning Costin Raiciu University - - PowerPoint PPT Presentation

Dynamic Replication and Partitioning Costin Raiciu University College London Joint work with Mark Handley, David S. Rosenblum Motivation: Web Search Search engines Create an index of the web Queries consult the index to find


slide-1
SLIDE 1

Dynamic Replication and Partitioning

Costin Raiciu

University College London

Joint work with Mark Handley, David S. Rosenblum

slide-2
SLIDE 2

2

Motivation: Web Search

  • Search engines

– Create an index of the web – Queries consult the index to find relevant documents – The documents are then ordered (e.g. Page Rank)

  • The index is huge: a few TB

– Must be partitioned to fit into memory – Must be replicated to increase query throughput and system availability

slide-3
SLIDE 3

3

Query

Google Web Search (Barroso et. al)

Cluster 1 Cluster 2 Cluster 3 Index split in Shards Merge and order results

slide-4
SLIDE 4

4

Big Picture: Distributed Rendez-Vous

Index Shard Query Average Replication Level R=5 Hop Count H=3 Load Balancer Overlay Node

slide-5
SLIDE 5

5

Distributed Rendez Vous is important

  • Many other applications use it

– Online Filtering – Distributed databases

  • Combines replication and partitioning

– Increasing replication (R) increases availability, but has high cost for storing the index – Increasing the forwarding hops (H) creates high bandwidth cost for transient objects – Tradeoff: R·H ≥ #nodes

slide-6
SLIDE 6

6

The Problem

  • Who chooses the number of clusters? Depends on:

– Frequencies and sizes of index and queries – Bandwidth constraints – Memory constraints – Number of nodes

  • R varies with time!

How can we adjust the Replication Rate in distributed rendez-vous?

slide-7
SLIDE 7

7

Obvious approach

  • Google architecture

– Replication tied to network structure – Increase replication level

  • Destroy cluster, add the nodes to the other clusters

– Issues

  • Temporarily reduces the capacity of the network
  • Not simple to implement
  • Google solution: buy more hardware

Cluster 1 Cluster 2 Cluster 3

X

slide-8
SLIDE 8

8

On average, each query meets each index shard

  • nce

To increase the replication level, each node creates 1 new replica for active queries

A randomized implementation

Index Shard Query

N=15 R=5 H=3

slide-9
SLIDE 9

9

Our solution: ROAR

  • Rendez-Vous On A Ring

– Similar in spirit to Random – But with deterministic properties – Does not tie network structure to replication level

slide-10
SLIDE 10

10

ROAR Overview

Replication Level: 5

Index Shard Query

  • Nodes on a Chord ring
  • ID space virtually split in R

intervals

  • Replicate

– Hash and store – Forward to equivalent node in next interval

  • Route

– Uniformly choose interval and direction – Route to all nodes in that interval

0 max

slide-11
SLIDE 11

11

ROAR Analysis

  • Equal spacing is important

– When R increases, it ensures that no 2 replicas are in the same interval – Stable state: if R is constant enough time, equivalent nodes have equivalent content

  • Useful for fault tolerance

– When R changes:

  • Stability is maintained if R is doubled of halved
  • Otherwise, not stable: wait for objects to expire
slide-12
SLIDE 12

12

Increasing Replication

Replication Level: 5 -> 6

0 max

slide-13
SLIDE 13

13

Increasing Replication (2)

  • Observation. When replication level is R, we can route at

any level R’≤R.

  • ROAR can route while changing replication levels

– Wait until all nodes in interval reach new replication – Begin routing at new replication level

  • When is the new replication level reached?

– Compute persistent object count at replication level R and R+1

  • When approximately equal, safe to switch to new routing.

– Count is piggybacked on queries - very small cost

slide-14
SLIDE 14

14

Fault Tolerance

X

Stable state

Query

slide-15
SLIDE 15

15

Fault Tolerance

Not in stable state

X Query

slide-16
SLIDE 16

16

Comparison

Yes 35%

miss probability

Yes RV Guaranteed? O(I·R/N)

  • r 1

O(I·R/N) 1 Bw Cost on Node Failure I I ~2·I Bw for R = R+1 No 25%

redundant RV probability

No RV Redundant?

ROAR Random Google

  • Bandwidth scarce system

– R = O(√N) – I = # total size of index

slide-17
SLIDE 17

17

Comparison (2)

  • 1% permanent failures per year

– Commercial data: 5% failures in 1st year – Transient failures tolerated with stable state

ROAR better Google better

slide-18
SLIDE 18

18

Summary

  • Distributed rendez-vous is an important problem in

distributed computing

– Changing R is a requirement for optimal solutions

  • ROAR - simple algorithm

– Distributed in spirit

  • No need for external load balancing
  • Can run on deployed structured overlays

– Achieves reconfiguration without changing network structure – In stable state as good as Google – When reconfigurations are often, does better

slide-19
SLIDE 19

19

References

  • Web Search for a Planet: the Google Cluster

Architecture - Barroso et. al