Motivation Eventually-consistent sometimes isnt good enough. - PowerPoint PPT Presentation

Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742

Motivation ✤ “Eventually-consistent” sometimes isn’t good enough. ✤ General Purpose Transactions (ACID) ✤ Application desires complex, evolving schemas ✤ Schematized Tables ✤ SQL-like query language

The Problem ✤ Store data across thousands of machines, hundreds of data centres ✤ Replication across data centres, even continents

Spanner Features ✤ Lock-free distributed read transactions from any sufficiently-up-to- date replica ✤ External consistency ✤ Commit order == Timestamp Order == Global Wall Clock Time ✤ The “TrueTime”API

Lock-free Reads ✤ Example t “unfriend” Write untrusty dissenting person post ✤ Single Machine Read Friend1 post Generated Page Friend2 post User’s Posts ... Friends’ Lists

Lock-free Reads ✤ Example t “unfriend” Write untrusty dissenting person post ✤ Single Machine Read Block writes Friend1 post Generated Page Friend2 post User’s Posts ... Friends’ Lists

Lock-free Reads ✤ Example t “unfriend” Write untrusty dissenting person post ✤ Single Machine Read Friend1 post Generated Page Friend2 post User’s Posts ... Friends’ Lists

Lock-free Reads Friend1 post Friend2 post User’s Posts ... Friends’ Lists Generated Page Friend100 post Friend101 post User’s Posts ... Friends’ Lists

Lock-free Reads Block writes Friend1 post Friend2 post User’s Posts ... Friends’ Lists Generated Page Friend100 post Friend101 post User’s Posts ... Friends’ Lists

Lock-free Reads Friend1 post Friend2 post User’s Posts ... Friends’ Lists Generated Page Friend100 post Friend101 post User’s Posts ... Friends’ Lists

T rueTime API ✤ TT.Now() 2 ε t latest earliest TT.Now()

Read-Write T ransaction ✤ 2 Phase Locking Acquired Release all locks all locks t t = Commit wait TT.now() s = t.latest()

Overlapping with Commit Wait ✤ Network cost to achieve consensus far dominates time for commit wait; no need to wait Finished Start consensus Acquired Release consensus all locks all locks t t = Commit wait TT.now() s = t.latest()

Integrating 2PC and T rueTime Commit Wait done Start Logging Done Logging Release Acquired all locks all locks t Release Acquired all locks all locks t Acquired Release all locks all locks t Prepared, send s Each computes s

Implementing T rueTime Atomic GPS GPS clock Atomic clock Timemaster Timemaster Timemaster Poll Client Datacenter 1 Datacenter 2 Datacenter 3

Implementing T rueTime ✤ Time at synchronization (polling of timemasters, every 30 seconds) ✤ Time is from nearest available timemaster ✤ Poll nearby datacenter’s timemasters for redundancy, detect rogue timemasters. Use variation on Marzullo’s Algorithm to detect liars, compute time of non-liars. ✤ ε resets to ε broadcast by Timemaster plus communication time (1ms) plus ✤ Between synchronizations: ✤ Increase ε by local drift (200us/s)

Time availability by design ✤ Commit time uses variable ε ✤ If local timemaster not available, can use remote timemaster from other data center (100+ ms delay) ✤ Spanner slows down automatically

Easy Schema Change ✤ Non-blocking variant of regular transaction ✤ At prepare stage, choose a timestamp t in the future ✤ Reads and writes which implicitly depend on schema: ✤ If their time is before t, proceed ✤ If their time is after t, block ✤ Without TrueTime, defining a schema change to happen at “time t” would be meaningless.

Spanner Implementation Details ✤ Tablet: Similar to Bigtable’s tablet . A bag of mappings of: ✤ (key:string, timestamp:int64) -> string ✤ More like multi-version database ✤ Stored on Colossus (distributed file system)

Spanner Implementation Details Tablets are replicated (between datacenters, possibly inter-continental), concurrency ✤ coordination by Paxos A transaction needs consistency across its replicas; coordinated by Paxos ✤ Paxos Group Paxos Group: A tablet and its Paxos Leader replicas as well as the concurrency Paxos Paxos Paxos machinery across the replicas tablet1 tablet1 tablet1 (replica 1) (replica 1) (replica 1) replica replica replica

Spanner Implementation Details If transaction involves multiple Paxos Groups, use transaction 2PC Coordination management machinery atop of Paxos groups to coordinate 2PC Participant Participant Participant Leader Leader Leader If a transaction involves a ✤ single Paxos Group, can bypass Transaction Transaction Transaction Transaction Manager and Manager Manager Manager Participant Leader machinery. Paxos Paxos Paxos Thus, system involves 2 stages ✤ Group Group Group of concurrency control, 2PC and Paxos, where one stage can be skipped.

Lock-free Reads at a Timestamp ✤ Each replica maintains t safe ✤ t safe = min(t paxossafe , t TMsafe ) ✤ t paxossafe is timestamp of highest-applied Paxos write ✤ t TMsafe is much harder: ✤ = ∞ if no pending 2PC transaction ✤ = min i (s prepare i,g ) over i prepared transactions in group g. ✤ Thus, t safe is maximum timestamp at which reads are safe

Data Locality ✤ Application-level controllable data locality ✤ Prefix of key used to define the bucket Key: 0PZX2N47HL5N4MAE3Q... ✤ Key: 0PZX2N47HL5N7U9OY2... ✤ Key: 0PZX2N47HL5NQBDP73... ✤ ✤ Entries in the same bucket are always in the same Paxos group. ✤ Can balance load between Paxos groups by moving buckets.

Benchmarks ✤ 50 Paxos groups, 2500 buckets, 4KB reads or writes, datacenters 1ms apart ✤ Latency remains mostly constant as number of replicas increases because Paxos executes in parallel at a group’s replicas ✤ Less sensitivity to a slow replica as number of replicas increases (easy to achieve quorum).

Benchmarks ✤ All leaders explicitly placed in zone Z1. ✤ Killing all servers in a zone at 5 seconds. For Z1 test, completion rate drops to almost 0. ✤ Recovers quickly after reelection of new leader

Critique ✤ No background on current global time synchronization techniques ✤ Lack of proofs of absolute error bounds in their TrueTime implementation ✤ External consistency? Guess at implied meaning (referenced PhD dissertation not available online) ✤ Pipelined Paxos? Not described. Is each replica governed by a replica- wide lock so one replica cannot undergo Paxos concurrently on disjoint rows?

Motivation Eventually-consistent sometimes isnt good enough. - PowerPoint PPT Presentation

Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742 Motivation Eventually-consistent sometimes isnt good enough.

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

TVI AGM Presentation June 10, 2015 Qualified Person Statement and Disclaimer Qualified Person

COMMON CLEANERS AND SANITIZERS FOR BREWING Key Points To Remember 1. Cleaning is not

SNAME T&R Bulletin 6-1 & IMO MEPC.1/Circular 677: Guide to Diagnosing Contaminants in Oily

Fuller Treacy Money : Markets Now Prophets, Philosophers and Zookeepers 28 September 2015 Dr

WINNING DECISIVELY CLSA INVESTOR CONFERENCE | 14 SEPT 2017 Sanjiv Mehta, CEO & MD SAFE

Multi-criteria analyses of two solvent and one low-temperature concepts for acid gas removal from

MOINA GROUP (12 th JUNE to 22 nd JUNE, 2018) Members- ANSHI SRIVASTAVA VINUSHA CHARU PRITWANI

The Impact of the Food Safety Modernization Act on Small Food Businesses in South Carolina

Motivation Eventually-consistent sometimes isnt good enough. - PowerPoint PPT Presentation

Google Spanner - A Globally Distributed, Synchronously-Replicated Database System James C. Corbett, et. al. Feb 14, 2013. Presented By Alexander Chow For CS 742 Motivation Eventually-consistent sometimes isnt good enough.

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

Bringing Portraits to Life CS448V: Lecture 13 Motivation Motivation Motivation Bring Your

Motivation: Theory &amp; practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

5. Motivation Motivation: Big Questions Where does motivation come from? Can

Indoor Places Lukas Kuster Motivation GPS for localization [7] 2 Motivation Indoor

UBER RUSH AND REBUILDING UBERS DISPATCHING PLATFORM motivation CHAPTER 1 OF 8 MOTIVATION

MOTIVATION MOTIVATION Dr. M. Thenmozhi Professor Department of Management Studies Indian

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

MOTIVATION Watch this video on intrinsic versus extrinsic motivation Value x Expectation (of

Learner Motivation Motivational Self-Reflection Self-Reflection Time Travel Think about a time

Motivation What is Motivation? How motivated are you now? What are your thoughts as you enter

RedGate - Enterprise MSE Project - Phase I Integration Server Motivation 2 Motivation 2

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/13/2011 Motivation and Toolkits

Recent work in Truncated Statistics Andrew Ilyas Motivation: Poincar and the Baker

Comp/Phys/Mtsc 715 Lecture 2: Motivation and Toolkits 1/14/2014 Motivation and Toolkits

TVI AGM Presentation June 10, 2015 Qualified Person Statement and Disclaimer Qualified Person

COMMON CLEANERS AND SANITIZERS FOR BREWING Key Points To Remember 1. Cleaning is not

SNAME T&amp;R Bulletin 6-1 &amp; IMO MEPC.1/Circular 677: Guide to Diagnosing Contaminants in Oily

Fuller Treacy Money : Markets Now Prophets, Philosophers and Zookeepers 28 September 2015 Dr

WINNING DECISIVELY CLSA INVESTOR CONFERENCE | 14 SEPT 2017 Sanjiv Mehta, CEO &amp; MD SAFE

Multi-criteria analyses of two solvent and one low-temperature concepts for acid gas removal from

MOINA GROUP (12 th JUNE to 22 nd JUNE, 2018) Members- ANSHI SRIVASTAVA VINUSHA CHARU PRITWANI

The Impact of the Food Safety Modernization Act on Small Food Businesses in South Carolina

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

Motivation: Theory & practice 2017-18 I MPORTANCE OF MOTIVATION Employees may lack

SNAME T&R Bulletin 6-1 & IMO MEPC.1/Circular 677: Guide to Diagnosing Contaminants in Oily

WINNING DECISIVELY CLSA INVESTOR CONFERENCE | 14 SEPT 2017 Sanjiv Mehta, CEO & MD SAFE