Dynamic Replication and Partitioning Costin Raiciu University - - PowerPoint PPT Presentation

▶

Sep 15, 2023 252 likes •452 views

Dynamic Replication and Partitioning Costin Raiciu University College London Joint work with Mark Handley, David S. Rosenblum Motivation: Web Search Search engines Create an index of the web Queries consult the index to find

SLIDE 1

Dynamic Replication and Partitioning

Costin Raiciu

University College London

Joint work with Mark Handley, David S. Rosenblum

SLIDE 2

Motivation: Web Search

Search engines

– Create an index of the web – Queries consult the index to find relevant documents – The documents are then ordered (e.g. Page Rank)

The index is huge: a few TB

– Must be partitioned to fit into memory – Must be replicated to increase query throughput and system availability

SLIDE 3

Query

Google Web Search (Barroso et. al)

Cluster 1 Cluster 2 Cluster 3 Index split in Shards Merge and order results

SLIDE 4

Big Picture: Distributed Rendez-Vous

Index Shard Query Average Replication Level R=5 Hop Count H=3 Load Balancer Overlay Node

SLIDE 5

Distributed Rendez Vous is important

Many other applications use it

– Online Filtering – Distributed databases

Combines replication and partitioning

– Increasing replication (R) increases availability, but has high cost for storing the index – Increasing the forwarding hops (H) creates high bandwidth cost for transient objects – Tradeoff: R·H ≥ #nodes

SLIDE 6

The Problem

Who chooses the number of clusters? Depends on:

– Frequencies and sizes of index and queries – Bandwidth constraints – Memory constraints – Number of nodes

R varies with time!

How can we adjust the Replication Rate in distributed rendez-vous?

SLIDE 7

Obvious approach

Google architecture

– Replication tied to network structure – Increase replication level

Destroy cluster, add the nodes to the other clusters

– Issues

Temporarily reduces the capacity of the network
Not simple to implement
Google solution: buy more hardware

Cluster 1 Cluster 2 Cluster 3

X

SLIDE 8

On average, each query meets each index shard

To increase the replication level, each node creates 1 new replica for active queries

A randomized implementation

Index Shard Query

N=15 R=5 H=3

SLIDE 9

Our solution: ROAR

Rendez-Vous On A Ring

– Similar in spirit to Random – But with deterministic properties – Does not tie network structure to replication level

SLIDE 10

ROAR Overview

Replication Level: 5

Index Shard Query

Nodes on a Chord ring
ID space virtually split in R

intervals

Replicate

– Hash and store – Forward to equivalent node in next interval

Route

– Uniformly choose interval and direction – Route to all nodes in that interval

0 max

SLIDE 11

ROAR Analysis

Equal spacing is important

– When R increases, it ensures that no 2 replicas are in the same interval – Stable state: if R is constant enough time, equivalent nodes have equivalent content

Useful for fault tolerance

– When R changes:

Stability is maintained if R is doubled of halved
Otherwise, not stable: wait for objects to expire

SLIDE 12

Increasing Replication

Replication Level: 5 -> 6

0 max

SLIDE 13

Increasing Replication (2)

Observation. When replication level is R, we can route at

any level R’≤R.

ROAR can route while changing replication levels

– Wait until all nodes in interval reach new replication – Begin routing at new replication level

When is the new replication level reached?

– Compute persistent object count at replication level R and R+1

When approximately equal, safe to switch to new routing.

– Count is piggybacked on queries - very small cost

SLIDE 14

Fault Tolerance

X

Stable state

Query

SLIDE 15

Fault Tolerance

Not in stable state

X Query

SLIDE 16

Comparison

Yes 35%

miss probability

Yes RV Guaranteed? O(I·R/N)

O(I·R/N) 1 Bw Cost on Node Failure I I ~2·I Bw for R = R+1 No 25%

redundant RV probability

No RV Redundant?

ROAR Random Google

Bandwidth scarce system

– R = O(√N) – I = # total size of index

SLIDE 17

Comparison (2)

1% permanent failures per year

– Commercial data: 5% failures in 1st year – Transient failures tolerated with stable state

ROAR better Google better

SLIDE 18

Summary

Distributed rendez-vous is an important problem in

distributed computing

– Changing R is a requirement for optimal solutions

ROAR - simple algorithm

– Distributed in spirit

No need for external load balancing
Can run on deployed structured overlays

– Achieves reconfiguration without changing network structure – In stable state as good as Google – When reconfigurations are often, does better

SLIDE 19

References

Web Search for a Planet: the Google Cluster

Dynamic Replication and Partitioning

Costin Raiciu

University College London

Joint work with Mark Handley, David S. Rosenblum

Motivation: Web Search

– Create an index of the web – Queries consult the index to find relevant documents – The documents are then ordered (e.g. Page Rank)

– Must be partitioned to fit into memory – Must be replicated to increase query throughput and system availability

Query

Google Web Search (Barroso et. al)

Cluster 1 Cluster 2 Cluster 3 Index split in Shards Merge and order results

Big Picture: Distributed Rendez-Vous

Index Shard Query Average Replication Level R=5 Hop Count H=3 Load Balancer Overlay Node

Distributed Rendez Vous is important

– Online Filtering – Distributed databases

– Increasing replication (R) increases availability, but has high cost for storing the index – Increasing the forwarding hops (H) creates high bandwidth cost for transient objects – Tradeoff: R·H ≥ #nodes

The Problem

– Frequencies and sizes of index and queries – Bandwidth constraints – Memory constraints – Number of nodes

How can we adjust the Replication Rate in distributed rendez-vous?

Obvious approach

– Replication tied to network structure – Increase replication level

– Issues

Cluster 1 Cluster 2 Cluster 3

X

On average, each query meets each index shard

To increase the replication level, each node creates 1 new replica for active queries

A randomized implementation

Index Shard Query

N=15 R=5 H=3

Our solution: ROAR

– Similar in spirit to Random – But with deterministic properties – Does not tie network structure to replication level

ROAR Overview

Replication Level: 5

Index Shard Query

intervals

– Hash and store – Forward to equivalent node in next interval

– Uniformly choose interval and direction – Route to all nodes in that interval

0 max

ROAR Analysis

– When R increases, it ensures that no 2 replicas are in the same interval – Stable state: if R is constant enough time, equivalent nodes have equivalent content

– When R changes:

Increasing Replication

Replication Level: 5 -> 6

0 max

Increasing Replication (2)

any level R’≤R.

– Wait until all nodes in interval reach new replication – Begin routing at new replication level

– Compute persistent object count at replication level R and R+1

– Count is piggybacked on queries - very small cost

Fault Tolerance

X

Stable state

Query

Fault Tolerance

Not in stable state

X Query

Comparison

Yes 35%

miss probability

Yes RV Guaranteed? O(I·R/N)

O(I·R/N) 1 Bw Cost on Node Failure I I ~2·I Bw for R = R+1 No 25%

redundant RV probability

No RV Redundant?

ROAR Random Google

– R = O(√N) – I = # total size of index

Comparison (2)

– Commercial data: 5% failures in 1st year – Transient failures tolerated with stable state

ROAR better Google better

Summary

distributed computing

– Changing R is a requirement for optimal solutions

– Distributed in spirit

– Achieves reconfiguration without changing network structure – In stable state as good as Google – When reconfigurations are often, does better

References

Architecture - Barroso et. al