Databases 2 - Optional Presentation Andrea Gussoni Politecnico di Milano July 15, 2016 Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 1 / 39
Coordination Avoidance Table of Contents Coordination Avoidance 1 Trekking Through Siberia 2 Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 2 / 39
Coordination Avoidance Some information on the paper Title: Coordination Avoidance in Database Systems. Authors: Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica. Presented at 2015 VLDB . Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 3 / 39
Coordination Avoidance The Problem At the present time, Database Systems in a distributed scenario are increasingly common. This means that the task of coordinating different entities is assuming a lot of importance. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 4 / 39
Coordination Avoidance The Problem Usually concurrency control protocols are necessary because we want to guarantee the consistency of the application level data through the use of a database layer that check and solve the possible problems and conflicts. An example can be the use of a 2PL serialization technique that is often used in commercial DBMS. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 5 / 39
Coordination Avoidance The Problem Mixing this with a distributed scenario means the necessity to introduce complex algorithms (such as 2PC) that coordinate the various entities involved in the transactions, introducing latency . Coordination also means that we cannot exploit all the parallel resources of a distributed environment, because we have a huge overhead introduced by the coordination phase. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 6 / 39
Coordination Avoidance The Problem Usually we pay coordination overhead in term of: Increased latency. Decreased throughput. Unavailability (in case of failures). Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 7 / 39
Coordination Avoidance The Problem Figure: Microbenchmark performance of coordinated and coordination-free execution on eight separate multi-core servers. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 8 / 39
Coordination Avoidance Invariant Confluence The authors of the paper discuss this new technique (or better analysis framework ) that if applied, it will reduce in a considerable way the need of coordination between the Database entities, reducing the cost in terms of bandwidth and latency, increasing considerably the overall throughput of the system. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 9 / 39
Coordination Avoidance Invariant Confluence The main idea here is not to introduce some new exotic way to improve the coordination task, but instead the authors predicate on the fact that there is a set of workloads that do not require coordination , and that can be executed in parallel. The programmer at the application level can then state in an explicit way the invariants , special attributes of the tables that need coordination in case of concurrent operations executing on them. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 10 / 39
Coordination Avoidance The Model The main concepts introduced: Invariants Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 11 / 39
Coordination Avoidance The Model The main concepts introduced: Invariants Transactions Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 11 / 39
Coordination Avoidance The Model The main concepts introduced: Invariants Transactions Replicas Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 11 / 39
Coordination Avoidance The Model The main concepts introduced: Invariants Transactions Replicas ( I- )Convergence Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 11 / 39
Coordination Avoidance The Model The main concepts introduced: Invariants Transactions Replicas ( I- )Convergence Merging Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 11 / 39
Coordination Avoidance Convergence This is a figure that explains the main concept behind the idea of convergence: Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 12 / 39
Coordination Avoidance Coordination-Free Execution Here instead we show the basic evolution of a simple coordination free execution and the consequent merging operation: Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 13 / 39
Coordination Avoidance Invariants It is important to note that coordination can only be avoided if all local commit decisions are globally valid. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 14 / 39
Coordination Avoidance Invariants It is important to note that coordination can only be avoided if all local commit decisions are globally valid. So the best approach to guarantee the application level consistency is to apply a convergence analysis and then identify the true conflicts . The uncertain situations must be threated in a conservative approach. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 14 / 39
Coordination Avoidance Invariants It is important to note that coordination can only be avoided if all local commit decisions are globally valid. So the best approach to guarantee the application level consistency is to apply a convergence analysis and then identify the true conflicts . The uncertain situations must be threated in a conservative approach. This means that we rely on the analysis done by the programmer at the application level to guarantee the correctness. This is clearly a drawback. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 14 / 39
Coordination Avoidance Invariants Luckily there are some standard situations for the analysis of invariants that we can use as boilerplate in the building of the set of invariants of our application, this figure summarizes the main cases: Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 15 / 39
Coordination Avoidance Benchmarking The authors then proceeded to implement this new framework and test it with a standard benchmark, the TPC-C benchmark, that is said to be “the gold standard for database concurrency control both in research and industry.” They also used RAMP transactions, that are transactions that “employ limited multi-versioning and metadata to ensure that readers and writers can always proceed concurrently.” The selected language for the prototype is Scala , used for reason of compactness of the code. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 16 / 39
Coordination Avoidance Benchmarking In the next few slides there are some plots of the results obtained by the authors. The New-Order label refers to the fact that the authors, when an unique id assignment was needed, decided to assign a temp-ID , and only just before the commit, a sequential (as required from the specifications of the benchamrk) real-ID is assigned, and a table mapping tmp-ID to real-ID is created. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 17 / 39
Coordination Avoidance Results Figure: TPC-C New-Order throughput across eight servers. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 18 / 39
Coordination Avoidance Results Figure: Coordination-avoiding New-Order scalability. Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 19 / 39
Coordination Avoidance Conclusions This paper demonstrates that ACID transactions and associated strong isolation levels dominated the field of database concurrency. This is a powerful abstraction that automatically guarantee consistency at the application level. In a distributed scenario where we want to achieve high scalability , we can sacrifice these abstractions and perform an I-Confluence analysis in order to exploit scalability through coordination-free transactions Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 20 / 39
Trekking Through Siberia Table of Contents Coordination Avoidance 1 Trekking Through Siberia 2 Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 21 / 39
Trekking Through Siberia Some information on the paper Title: Trekking Through Siberia: Managing Cold Data in a Memory-Optimized Database. Authors: Ahmed Eldawy, Justin Levandoski, Per-Ake Larson. Presented at 2014 VLDB . Andrea Gussoni (Politecnico di Milano) Databases 2 - Optional Presentation July 15, 2016 22 / 39
Recommend
More recommend