On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu - PowerPoint PPT Presentation

On Data Placement in Distributed Systems LADIS’14 Jo˜ ao Paiva , Lu´ ıs Rodrigues { joao.paiva, ler } @tecnico.ulisboa.pt Instituto Superior T´ ecnico / INESC-ID, Lisboa, Portugal October 23, 2014

What is Data Placement? ◮ Deciding how to assign data items to nodes in a distributed system in such way that they can be later retrieved.

Data Placement Affects Data Access Locality Placing correlated data together can reduce latency of operations Load Balancing By knowing the workload, data can be placed in a way to even out the load across all nodes Availability Data can be replicated depending on probability of node failure

Constraints to data placement practicality ◮ Lack of flexibility limits data placement improvements ◮ Scalability imposes limits on the flexibility of placement Example ◮ Using a centralized directory is flexible, but not scalable ◮ Using consistent hashing is scalable, but not flexible

Main Goal Provide better options between ◮ Strong flexibility, limited scalability ◮ Limited flexibility, good scalability

Two Scenarios Internet Scale ◮ Millions of nodes ◮ Short term connections ◮ Asymmetric, inconstant network Datacenter Scale ◮ Thousands of nodes ◮ Stable connections ◮ Controlled network infrastructure

Two Scenarios: Previous state of the art Internet Scale ◮ Scalable solutions with little flexibility, concerned with churn Datacenter Scale ◮ Very flexible solutions, concerned with workload changes

Summary of Findings Improvements for both scenarios: ◮ More flexible solution for Internet-Scale ◮ More scalable solution for Datacenter-Scale

Outline Introduction Internet Scale Datacenter Scale Conclusion

Internet Scale: Rollerchain ◮ Data assigned to node groups ◮ Variable replication degree ◮ Nodes have no fixed position

Variable Replication Degree

Nodes have no fixed position

Internet Scale: Implementation Rollerchain ◮ Gossip-based and structured overlay ◮ Better churn resilience than state of the art ◮ Decreased replication costs ”Rollerchain: a DHT for Efficient Replication” , Jo˜ ao Paiva, Jo˜ ao Leit˜ ao and Lu´ ıs Rodrigues, Symposium on Network Computing and Applications ( IEEE NCA ), August 2013. ( Best student paper award )

Internet Scale: Implementation Data Placement Policies ◮ Avoid-Surplus : Reducing monitoring costs ◮ Resilient Load-Balancing : Improving load balancing ◮ Supersize-me : Reducing replication costs Read the paper to know the best policies: ”Policies for Efficient Data Replication in P2P Systems” , Jo˜ ao Paiva, and Lu´ ıs Rodrigues, International Conference on Parallel and Distributed Systems ( IEEE ICPADS ), December 2013.

Outline Introduction Internet Scale Datacenter Scale Conclusion

Datacenter scale: AutoPlacer System where data placement is defined by combining: ◮ Consistent hashing for most items ◮ Precise placement for selected items Locality-improving round-based algorithm for in-memory data grids ”AutoPlacer: scalable self-tuning data placement in distributed key-value stores” , J. Paiva, P. Ruivo, P. Romano and L. Rodrigues, International Conference on Autonomic Computing ( USENIX ICAC ), June 2013. ( Best paper finalist ) ”AutoPlacer: scalable self-tuning data placement in distributed key-value stores” , J. Paiva, P. Ruivo, P. Romano and L. Rodrigues, ACM Transactions on Autonomous and Adaptive Systems ( ACM TAAS )

Algorithm overview Online, round-based approach: 1. Statistics: Monitor data access to collect hotspots 2. Optimization: Decide placement for hotspots 3. Lookup: Encode / broadcast data placement 4. Move data

Statistics: Data access monitoring Key concept: Top-K stream analysis algorithm ◮ Lightweight ◮ Sub-linear space usage ◮ Inaccurate result... But with bounded error

Optimization Integer Linear Programming problem formulation: � � X ij ( cr r r ij + cr w w ij ) + X ij ( cl r r ij + cl w w ij ) min (1) j ∈N i ∈O subject to: � � ∀ i ∈ O : X ij = d ∧ ∀ j ∈ N : X ij ≤ S j j ∈N i ∈O Inaccurate input: ◮ Does not provide optimal placement ◮ Upper-bound on error

Accelerating optimization 1. ILP Relaxed to Linear Programming problem 2. Distributed optimization LP relaxation ◮ Allow data item ownership to be in [0 − 1] interval Distributed Optimization ◮ Partition by the N nodes ◮ Each node optimizes hotspots mapped to it by CH ◮ Strengthen capacity constraint

Lookup: Encoding placement Probabilistic Associative Array ( PAA ) ◮ Associative array interface (keys → values) ◮ Probabilistic and space-efficient ◮ Trade-off space usage for accuracy

Probabilistic Associative Array: Usage Building 1. Build PAA from hotspot mappings 2. Broadcast PAA Looking up objects ◮ If item is hotspot, return PAA mapping ◮ Otherwise, default to Consistent Hashing

PAA: Building blocks ◮ Bloom Filter Space-efficient membership test (is item in PAA?) ◮ Decision tree classifier Space-efficient mapping (where is hotspot mapped to?)

PAA: Properties Bloom Filter: ◮ No False Negatives : never return ⊥ for items in PAA. ◮ False Positives : match items that it was not supposed to. Decision tree classifier: ◮ Inaccurate values (bounded error). ◮ Deterministic response : deterministic (item → node) mapping.

Outline Introduction Internet Scale Datacenter Scale Autoplacer Evaluation Conclusion Conclusion

Evaluation: Throughput 1000 100% locality 90% locality 50% locality Transactions per second (TX/s) 0% locality baseline 100 10 0 5 10 15 20 25 30 Time (minutes)

Evaluation: Optimization

On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu - PowerPoint PPT Presentation

On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu s Rodrigues { joao.paiva, ler } @tecnico.ulisboa.pt Instituto Superior T ecnico / INESC-ID, Lisboa, Portugal October 23, 2014 What is Data Placement? Deciding

VLSI Placement Sadiq M. Sait & Habib Youssef December 1995 Placement Placement is the

TimberWolf 7.0 Placement Perform TimberWolf placement Based on the given standard cell

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Student Placement Task Force Student placement option presentation Maize Board of Education |

College Placement Presentation October 30, 2019 Dave Bucciero Director of College Placement

ADVANCED PLACEMENT The purpose of the Advanced Placement program is to provide the students with

Advanced Placement Physics 1 Advanced Placement Physics 2 Dr. Matt Frederickson Dr. Kevin

College Placement Presentation October 24, 2018 Dave Bucciero Director of College Placement

INCREASING CIRCULATION BOOK DISPLAYS THROUGH 2 Placement PLACEMENT LIBRARY GEOGRAPHY

BonnPlace : A Self-Stabilizing Placement Framework Ulrich Brenner, Anna Hermann, Nils Hoppmann,

The ISPD 2006 Placement Contest and Benchmark Suite Gi-Joon Nam, Charles J. Alpert, Paul G.

GORDIAN Placement Perform GORDIAN placement Uniform area and net weight, area balance

Using machine learning Learning knot methods in geometric modeling placement SVM knot placement

via Dynamic Multi-Path Routing in a Full Mesh Network European Patents EP2375650 and EP2634981

~ 111111 1111111111111111111111111111111111111111111111111111111111111111111111111111 US

Final 2018-2022 CIP Update Joint Board presentation June 19, 2017 1 Overview Since last

Scalable IP Lookup for Programmable Routers David E. Taylor, John W. Lockwood, Todd Sproull,

Minnesotas Actions and Developments in Distribution System Planning Tricia DeBleeckere

Weak Models of Distributed Computing, with Connections to Modal Logic Lauri Hella 1 , Matti J

How to build observable distributed systems @PierreVincent pvincent.io @PierreVincent

IESO York Region Non-Wires Alternative Demonstration Public Webinar December 12, 2019 Webinar

On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu - PowerPoint PPT Presentation

On Data Placement in Distributed Systems LADIS14 Jo ao Paiva , Lu s Rodrigues { joao.paiva, ler } @tecnico.ulisboa.pt Instituto Superior T ecnico / INESC-ID, Lisboa, Portugal October 23, 2014 What is Data Placement? Deciding

VLSI Placement Sadiq M. Sait &amp; Habib Youssef December 1995 Placement Placement is the

TimberWolf 7.0 Placement Perform TimberWolf placement Based on the given standard cell

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Student Placement Task Force Student placement option presentation Maize Board of Education |

College Placement Presentation October 30, 2019 Dave Bucciero Director of College Placement

ADVANCED PLACEMENT The purpose of the Advanced Placement program is to provide the students with

Advanced Placement Physics 1 Advanced Placement Physics 2 Dr. Matt Frederickson Dr. Kevin

College Placement Presentation October 24, 2018 Dave Bucciero Director of College Placement

INCREASING CIRCULATION BOOK DISPLAYS THROUGH 2 Placement PLACEMENT LIBRARY GEOGRAPHY

BonnPlace : A Self-Stabilizing Placement Framework Ulrich Brenner, Anna Hermann, Nils Hoppmann,

The ISPD 2006 Placement Contest and Benchmark Suite Gi-Joon Nam, Charles J. Alpert, Paul G.

GORDIAN Placement Perform GORDIAN placement Uniform area and net weight, area balance

Using machine learning Learning knot methods in geometric modeling placement SVM knot placement

via Dynamic Multi-Path Routing in a Full Mesh Network European Patents EP2375650 and EP2634981

~ 111111 1111111111111111111111111111111111111111111111111111111111111111111111111111 US

Final 2018-2022 CIP Update Joint Board presentation June 19, 2017 1 Overview Since last

Scalable IP Lookup for Programmable Routers David E. Taylor, John W. Lockwood, Todd Sproull,

Minnesotas Actions and Developments in Distribution System Planning Tricia DeBleeckere

Weak Models of Distributed Computing, with Connections to Modal Logic Lauri Hella 1 , Matti J

How to build observable distributed systems @PierreVincent pvincent.io @PierreVincent

IESO York Region Non-Wires Alternative Demonstration Public Webinar December 12, 2019 Webinar

VLSI Placement Sadiq M. Sait & Habib Youssef December 1995 Placement Placement is the

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges