Dynamo, Five Years Later Andy Gross Chief Architect, Basho - PowerPoint PPT Presentation

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies QCon London 2013 Friday, March 8, 13

Dynamo Published October 2007 @ SOSP Describes a collection of distributed systems techniques applied to low-latency key-value storage Spawned (along with BigTable) many imitators, an industry (LinkedIn -> Voldemort, Facebook -> Cassandra) Authors nearly got fired from Amazon for publishing Friday, March 8, 13

Riak - A Dynamo Clone First lines of first prototype written in Fall 2007 on a plane on the way to my Basho interview “Technical Debt” is another term we use at Basho for this code Mostly Erlang with some C/C++ Apache2 Licensed First release in 2009, 1.3 released 2/21/13 Friday, March 8, 13

Basho Friday, March 8, 13

Basho Founded late 2007 by ex-Akamai people Currently ~120 employees, distributed, with offices in Cambridge, San Francisco, London, and Tokyo We sponsor of Riak Open Source We sell Riak Enterprise (Riak + Multi-DC replication) We sell Riak CS (S3 clone backed by Riak Enterprise) Friday, March 8, 13

Principles Always-writable Incrementally scalable Symmetrical Decentralized Heterogenous Focus on SLAs, tail latency Friday, March 8, 13

Techniques Consistent Hashing Vector Clocks Read Repair Anti-Entropy Hinted Handoff Gossip Protocol Friday, March 8, 13

Consistent Hashing Invented by Danny Lewin and others @ MIT/Akamai Minimizes remapping of keys when number of hash slots changes Originally applied to CDNs, used in Dynamo for replica placement Enables incremental scalability, even spread Minimizes hot spots Friday, March 8, 13

Friday, March 8, 13

Vector Clocks Introduced by Mattern et al, in 1988 Extends Lamport’s timestamps (1978) Each value in Dynamo tagged with vector clock Allows detection of stale values, logical siblings Friday, March 8, 13

Read Repair Update stale versions opportunistically on reads (instead of writes) Pushes system toward consistency, after returning value to client Reflects focus on a cheap, always-available write path Friday, March 8, 13

Hinted Handoff Any node can accept writes for other nodes if they’re down All messages include a destination Data accepted by node other than destination is handed off when node recovers As long as a single node is alive the cluster can accept a write Friday, March 8, 13

Anti-Entropy Replicas maintain a Merkle Tree of keys and their versions/hashes Trees periodically exchanged with peer vnodes Merkle tree enables cheap comparison Only values with different hashes are exchanged Pushes system toward consistency Friday, March 8, 13

Gossip Protocol Decentralized approach to managing global state Trades off atomicity of state changes for a decentralized approach Volume of gossip can overwhelm networks without care Friday, March 8, 13

Hinted Handoff Friday, March 8, 13

Hinted Handoff X • Node fails X X X X X X X Friday, March 8, 13

Hinted Handoff X • Node fails X X • Requests go to fallback X X X X X hash(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) Friday, March 8, 13

Hinted Handoff • Node fails • Requests go to fallback • Node comes back hash(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) Friday, March 8, 13

Hinted Handoff • Node fails • Requests go to fallback • Node comes back • “Handoff” - data returns to recovered node hash(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) Friday, March 8, 13

Hinted Handoff • Node fails • Requests go to fallback • Node comes back • “Handoff” - data returns to recovered node • Normal operations resume hash(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak Get Handler (FSM) Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak hash(“blocks/ 6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) == 10, 11, 12 Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak hash(“blocks/ 6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak Get Handler (FSM) get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak Get Handler (FSM) R=2 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak Get Handler (FSM) v1 R=2 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client Riak Get Handler (FSM) v1 v2 R=2 Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) client v2 Riak Get Handler (FSM) v2 R=2 Friday, March 8, 13

Anatomy of a Request get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA” ) v2 Friday, March 8, 13

Read Repair get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) client v2 Riak Get Handler (FSM) v1 v2 R=2 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 v1 v2 Friday, March 8, 13

Read Repair get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) client v2 Riak Get Handler (FSM) v2 R=2 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 v1 v2 Friday, March 8, 13

Read Repair get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) client v2 Riak Get Handler (FSM) v2 R=2 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 v1 v1 v2 Friday, March 8, 13

Read Repair get(“ blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA ”) client v2 Riak Get Handler (FSM) v2 R=2 Coordinating node Cluster v2 v2 6 7 8 9 10 11 12 13 14 15 16 v2 v2 v2 Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce Riak Core Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce consistent hashing Riak Core Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce consistent hashing Riak Core membership Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce consistent hashing handoff Riak Core membership Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce consistent hashing handoff Riak Core membership node-liveness Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce consistent hashing handoff gossip Riak Core membership node-liveness Riak KV Friday, March 8, 13

Riak Architecture Erlang/OTP Runtime Client APIs HTTP Protocol Buffers Erlang local client Request Coordination get put delete map-reduce consistent hashing handoff gossip Riak Core membership node-liveness buckets Riak KV Friday, March 8, 13

Dynamo, Five Years Later Andy Gross Chief Architect, Basho - PowerPoint PPT Presentation

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies QCon London 2013 Friday, March 8, 13 Dynamo Published October 2007 @ SOSP Describes a collection of distributed systems techniques applied to low-latency key-value

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies GOTO Chicago 2013

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies QCon SF 2012 Thursday,

Dynamo Saurabh Agarwal What have we looked at so far ? Assumptions CAP Theorem SQL and

DynamO Workshop Introduction to Event-Driven Dynamics and DynamO Dr Marcus N. Bannerman & Dr

Dynamo & Bigtable CSCI 2270, Spring 2011 Irina Calciu Zikai Wang Dynamo Amazon's highly

Dynamo Dynamo motivation Fast, available writes - Shopping cart: always enable purchases FLP:

Dynamo Dynamo motivation Fast, available writes - Shopping cart: always enable purchases FLP:

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Seven Years Later: Seven Years Later: What the Agile Manifesto Left Out What the Agile Manifesto

Life on on the the Battlefields Battlefields Life 94 Years Years Later Later 94 Charlotte

DYNAMO: DYnamic Inputs of Natural Conditions for Air Quality MOdels AQAST Year 3 Tiger Team

Amazon Dynamo distributed key-value storage Michal Oniszczuk October 10, 2012 Michal Oniszczuk

Building a Dynamo Bridge between revit and excel Vdc Tdindustries Craig technology chappell

DYNAMO INVESTMENT PROJECTS 6 NOVEMBER 2014 AEB OFFICE Antonio Linares AEB Board Member,

A Global Model Investigation of MJO Initiation for DYNAMO Guang Zhang Scripps Institution of

www.hdtsoccer.com All teams will wear the dynamo uniform with logos and the club has all rights

Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp - NETTAB 2007

Link-based Web Search Web Search PageRank HITS Stability Issues Current

CMSC 110 Instructor: Grading Jia Tao, Ph.D.

QCD anatomy of WIMP- nucleon interactions Mikhail Solon UCB/LBNL MITP workshop on Effective

Neurobiology HMS 130/230 Harvard / GSAS 78454 Visual object recognition: From computational and

BANKING HUMAN BIOMATERIALS FOR RESEARCH Paul J. Volek, MPH Administrative Director Research

Introduction to Valuations & Financial Analytics Emily Riederer Instructor DataCamp

Aligning the FMA and the GO; Connecting DBs and KBs John H. Gennari and Adam Silberfein

Dynamo, Five Years Later Andy Gross Chief Architect, Basho - PowerPoint PPT Presentation

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies QCon London 2013 Friday, March 8, 13 Dynamo Published October 2007 @ SOSP Describes a collection of distributed systems techniques applied to low-latency key-value

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies GOTO Chicago 2013

Dynamo, Five Years Later Andy Gross Chief Architect, Basho Technologies QCon SF 2012 Thursday,

Dynamo Saurabh Agarwal What have we looked at so far ? Assumptions CAP Theorem SQL and

DynamO Workshop Introduction to Event-Driven Dynamics and DynamO Dr Marcus N. Bannerman &amp; Dr

Dynamo &amp; Bigtable CSCI 2270, Spring 2011 Irina Calciu Zikai Wang Dynamo Amazon's highly

Dynamo Dynamo motivation Fast, available writes - Shopping cart: always enable purchases FLP:

Dynamo Dynamo motivation Fast, available writes - Shopping cart: always enable purchases FLP:

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Seven Years Later: Seven Years Later: What the Agile Manifesto Left Out What the Agile Manifesto

Life on on the the Battlefields Battlefields Life 94 Years Years Later Later 94 Charlotte

DYNAMO: DYnamic Inputs of Natural Conditions for Air Quality MOdels AQAST Year 3 Tiger Team

Amazon Dynamo distributed key-value storage Michal Oniszczuk October 10, 2012 Michal Oniszczuk

Building a Dynamo Bridge between revit and excel Vdc Tdindustries Craig technology chappell

DYNAMO INVESTMENT PROJECTS 6 NOVEMBER 2014 AEB OFFICE Antonio Linares AEB Board Member,

A Global Model Investigation of MJO Initiation for DYNAMO Guang Zhang Scripps Institution of

www.hdtsoccer.com All teams will wear the dynamo uniform with logos and the club has all rights

Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp - NETTAB 2007

Link-based Web Search Web Search PageRank HITS Stability Issues Current

CMSC 110 Instructor: Grading Jia Tao, Ph.D.

QCD anatomy of WIMP- nucleon interactions Mikhail Solon UCB/LBNL MITP workshop on Effective

Neurobiology HMS 130/230 Harvard / GSAS 78454 Visual object recognition: From computational and

BANKING HUMAN BIOMATERIALS FOR RESEARCH Paul J. Volek, MPH Administrative Director Research

Introduction to Valuations &amp; Financial Analytics Emily Riederer Instructor DataCamp

Aligning the FMA and the GO; Connecting DBs and KBs John H. Gennari and Adam Silberfein

DynamO Workshop Introduction to Event-Driven Dynamics and DynamO Dr Marcus N. Bannerman & Dr

Dynamo & Bigtable CSCI 2270, Spring 2011 Irina Calciu Zikai Wang Dynamo Amazon's highly

Introduction to Valuations & Financial Analytics Emily Riederer Instructor DataCamp