On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu - - PowerPoint PPT Presentation

on data placement in distributed systems
SMART_READER_LITE
LIVE PREVIEW

On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu - - PowerPoint PPT Presentation

On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu s Rodrigues { joao.paiva, ler } @tecnico.ulisboa.pt Instituto Superior T ecnico / INESC-ID, Lisboa, Portugal October 23, 2014 What is Data Placement? Deciding


slide-1
SLIDE 1

On Data Placement in Distributed Systems

LADIS’14 Jo˜ ao Paiva, Lu´ ıs Rodrigues {joao.paiva, ler}@tecnico.ulisboa.pt

Instituto Superior T´ ecnico / INESC-ID, Lisboa, Portugal

October 23, 2014

slide-2
SLIDE 2

What is Data Placement?

◮ Deciding how to assign data items to nodes in a distributed

system in such way that they can be later retrieved.

slide-3
SLIDE 3

Data Placement Affects

Data Access Locality

Placing correlated data together can reduce latency of operations

Load Balancing

By knowing the workload, data can be placed in a way to even out the load across all nodes

Availability

Data can be replicated depending on probability of node failure

slide-4
SLIDE 4

Constraints to data placement practicality

◮ Lack of flexibility limits data placement improvements ◮ Scalability imposes limits on the flexibility of placement

Example

◮ Using a centralized directory is flexible, but not scalable ◮ Using consistent hashing is scalable, but not flexible

slide-5
SLIDE 5

Constraints to data placement practicality

◮ Lack of flexibility limits data placement improvements ◮ Scalability imposes limits on the flexibility of placement

Example

◮ Using a centralized directory is flexible, but not scalable ◮ Using consistent hashing is scalable, but not flexible

slide-6
SLIDE 6

Main Goal

Provide better options between

◮ Strong flexibility, limited scalability ◮ Limited flexibility, good scalability

slide-7
SLIDE 7

Two Scenarios

Internet Scale

◮ Millions of nodes ◮ Short term connections ◮ Asymmetric, inconstant network

Datacenter Scale

◮ Thousands of nodes ◮ Stable connections ◮ Controlled network infrastructure

slide-8
SLIDE 8

Two Scenarios: Previous state of the art

Internet Scale

◮ Scalable solutions with little flexibility, concerned with churn

Datacenter Scale

◮ Very flexible solutions, concerned with workload changes

slide-9
SLIDE 9

Summary of Findings

Improvements for both scenarios:

◮ More flexible solution for Internet-Scale ◮ More scalable solution for Datacenter-Scale

slide-10
SLIDE 10

Outline

Introduction Internet Scale Datacenter Scale Conclusion

slide-11
SLIDE 11

Internet Scale: Rollerchain

◮ Data assigned to node groups ◮ Variable replication degree ◮ Nodes have no fixed position

slide-12
SLIDE 12

Variable Replication Degree

slide-13
SLIDE 13

Variable Replication Degree

slide-14
SLIDE 14

Variable Replication Degree

slide-15
SLIDE 15

Variable Replication Degree

slide-16
SLIDE 16

Variable Replication Degree

slide-17
SLIDE 17

Variable Replication Degree

slide-18
SLIDE 18

Variable Replication Degree

slide-19
SLIDE 19

Nodes have no fixed position

slide-20
SLIDE 20

Nodes have no fixed position

slide-21
SLIDE 21

Nodes have no fixed position

slide-22
SLIDE 22

Internet Scale: Implementation

Rollerchain

◮ Gossip-based and structured overlay ◮ Better churn resilience than state of the art ◮ Decreased replication costs ”Rollerchain: a DHT for Efficient Replication”, Jo˜ ao Paiva, Jo˜ ao Leit˜ ao and Lu´ ıs Rodrigues, Symposium on Network Computing and Applications (IEEE NCA), August

  • 2013. (Best student paper award)
slide-23
SLIDE 23

Internet Scale: Implementation

Data Placement Policies

◮ Avoid-Surplus: Reducing monitoring costs ◮ Resilient Load-Balancing: Improving load balancing ◮ Supersize-me: Reducing replication costs

Read the paper to know the best policies:

”Policies for Efficient Data Replication in P2P Systems”, Jo˜ ao Paiva, and Lu´ ıs Rodrigues, International Conference on Parallel and Distributed Systems (IEEE ICPADS), December 2013.

slide-24
SLIDE 24

Internet Scale: Implementation

Data Placement Policies

◮ Avoid-Surplus: Reducing monitoring costs ◮ Resilient Load-Balancing: Improving load balancing ◮ Supersize-me: Reducing replication costs

Read the paper to know the best policies:

”Policies for Efficient Data Replication in P2P Systems”, Jo˜ ao Paiva, and Lu´ ıs Rodrigues, International Conference on Parallel and Distributed Systems (IEEE ICPADS), December 2013.

slide-25
SLIDE 25

Outline

Introduction Internet Scale Datacenter Scale Conclusion

slide-26
SLIDE 26

Datacenter scale: AutoPlacer

System where data placement is defined by combining:

◮ Consistent hashing for most items ◮ Precise placement for selected items

Locality-improving round-based algorithm for in-memory data grids

”AutoPlacer: scalable self-tuning data placement in distributed key-value stores”,

  • J. Paiva, P. Ruivo, P. Romano and L. Rodrigues, International Conference on

Autonomic Computing (USENIX ICAC), June 2013. (Best paper finalist) ”AutoPlacer: scalable self-tuning data placement in distributed key-value stores”,

  • J. Paiva, P. Ruivo, P. Romano and L. Rodrigues, ACM Transactions on Autonomous

and Adaptive Systems (ACM TAAS)

slide-27
SLIDE 27

Datacenter scale: AutoPlacer

System where data placement is defined by combining:

◮ Consistent hashing for most items ◮ Precise placement for selected items

Locality-improving round-based algorithm for in-memory data grids

”AutoPlacer: scalable self-tuning data placement in distributed key-value stores”,

  • J. Paiva, P. Ruivo, P. Romano and L. Rodrigues, International Conference on

Autonomic Computing (USENIX ICAC), June 2013. (Best paper finalist) ”AutoPlacer: scalable self-tuning data placement in distributed key-value stores”,

  • J. Paiva, P. Ruivo, P. Romano and L. Rodrigues, ACM Transactions on Autonomous

and Adaptive Systems (ACM TAAS)

slide-28
SLIDE 28

Algorithm overview

Online, round-based approach:

  • 1. Statistics: Monitor data access to collect hotspots
  • 2. Optimization: Decide placement for hotspots
  • 3. Lookup: Encode / broadcast data placement
  • 4. Move data
slide-29
SLIDE 29

Algorithm overview

Online, round-based approach:

  • 1. Statistics: Monitor data access to collect hotspots
  • 2. Optimization: Decide placement for hotspots
  • 3. Lookup: Encode / broadcast data placement
  • 4. Move data
slide-30
SLIDE 30

Statistics: Data access monitoring

Key concept: Top-K stream analysis algorithm

◮ Lightweight ◮ Sub-linear space usage ◮ Inaccurate result... But with bounded error

slide-31
SLIDE 31

Statistics: Data access monitoring

Key concept: Top-K stream analysis algorithm

◮ Lightweight ◮ Sub-linear space usage ◮ Inaccurate result... But with bounded error

slide-32
SLIDE 32

Statistics: Data access monitoring

Key concept: Top-K stream analysis algorithm

◮ Lightweight ◮ Sub-linear space usage ◮ Inaccurate result... But with bounded error

slide-33
SLIDE 33

Algorithm overview

Online, round-based approach:

  • 1. Statistics: Monitor data access to collect hotspots
  • 2. Optimization: Decide placement for hotspots
  • 3. Lookup: Encode / broadcast data placement
  • 4. Move data
slide-34
SLIDE 34

Optimization

Integer Linear Programming problem formulation: min

  • j∈N
  • i∈O

X ij(crrrij + crwwij) + Xij(clrrij + clwwij) (1) subject to: ∀i ∈ O :

  • j∈N

Xij = d ∧ ∀j ∈ N :

  • i∈O

Xij ≤ Sj

Inaccurate input:

◮ Does not provide optimal placement ◮ Upper-bound on error

slide-35
SLIDE 35

Accelerating optimization

  • 1. ILP Relaxed to Linear Programming problem
  • 2. Distributed optimization

LP relaxation

◮ Allow data item ownership to be in [0 − 1] interval

Distributed Optimization

◮ Partition by the N nodes ◮ Each node optimizes hotspots mapped to it by CH ◮ Strengthen capacity constraint

slide-36
SLIDE 36

Algorithm overview

Online, round-based approach:

  • 1. Statistics: Monitor data access to collect hotspots
  • 2. Optimization: Decide placement for hotspots
  • 3. Lookup: Encode / broadcast data placement
  • 4. Move data
slide-37
SLIDE 37

Lookup: Encoding placement

Probabilistic Associative Array (PAA)

◮ Associative array interface (keys→values) ◮ Probabilistic and space-efficient ◮ Trade-off space usage for accuracy

slide-38
SLIDE 38

Probabilistic Associative Array: Usage

Building

  • 1. Build PAA from hotspot mappings
  • 2. Broadcast PAA

Looking up objects

◮ If item is hotspot, return PAA mapping ◮ Otherwise, default to Consistent Hashing

slide-39
SLIDE 39

Probabilistic Associative Array: Usage

Building

  • 1. Build PAA from hotspot mappings
  • 2. Broadcast PAA

Looking up objects

◮ If item is hotspot, return PAA mapping ◮ Otherwise, default to Consistent Hashing

slide-40
SLIDE 40

PAA: Building blocks

◮ Bloom Filter

Space-efficient membership test (is item in PAA?)

◮ Decision tree classifier

Space-efficient mapping (where is hotspot mapped to?)

slide-41
SLIDE 41

PAA: Building blocks

◮ Bloom Filter

Space-efficient membership test (is item in PAA?)

◮ Decision tree classifier

Space-efficient mapping (where is hotspot mapped to?)

slide-42
SLIDE 42

PAA: Properties

Bloom Filter:

◮ No False Negatives: never return ⊥ for items in PAA. ◮ False Positives: match items that it was not supposed to.

Decision tree classifier:

◮ Inaccurate values (bounded error). ◮ Deterministic response: deterministic (item→node)

mapping.

slide-43
SLIDE 43

PAA: Properties

Bloom Filter:

◮ No False Negatives: never return ⊥ for items in PAA. ◮ False Positives: match items that it was not supposed to.

Decision tree classifier:

◮ Inaccurate values (bounded error). ◮ Deterministic response: deterministic (item→node)

mapping.

slide-44
SLIDE 44

PAA: Properties

Bloom Filter:

◮ No False Negatives: never return ⊥ for items in PAA. ◮ False Positives: match items that it was not supposed to.

Decision tree classifier:

◮ Inaccurate values (bounded error). ◮ Deterministic response: deterministic (item→node)

mapping.

slide-45
SLIDE 45

Outline

Introduction Internet Scale Datacenter Scale Autoplacer Evaluation Conclusion Conclusion

slide-46
SLIDE 46

Evaluation: Throughput

10 100 1000 5 10 15 20 25 30 Transactions per second (TX/s) Time (minutes) 100% locality 90% locality 50% locality 0% locality baseline

slide-47
SLIDE 47

Evaluation: Optimization

slide-48
SLIDE 48

Outline

Introduction Internet Scale Datacenter Scale Conclusion

slide-49
SLIDE 49

Conclusions

Internet Scale

◮ More flexible overlay for data placement ◮ Policies to improve metrics using added flexibility

Datacenter Scale

◮ Scalable mechanism for data placement ◮ Algorithm to improve locality through hotspot placement

slide-50
SLIDE 50

Thank you

joao.paiva@tecnico.ulisboa.pt web.tecnico.ulisboa.pt/joao.paiva