Eventually Consistent HTTP with Statebox and Riak Author: Bob - - PowerPoint PPT Presentation

eventually consistent http with statebox and riak
SMART_READER_LITE
LIVE PREVIEW

Eventually Consistent HTTP with Statebox and Riak Author: Bob - - PowerPoint PPT Presentation

11/16/11 Eventually Consistent HTTP with Statebox and Riak Eventually Consistent HTTP with Statebox and Riak Author: Bob Ippolito (@etrepum) Date: November 2011 Venue: QCon San Francisco 2011


slide-1
SLIDE 1

11/16/11 Eventually Consistent HTTP with Statebox and Riak 1/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Eventually Consistent HTTP with Statebox and Riak

Author: Bob Ippolito (@etrepum) Date: November 2011 Venue: QCon San Francisco 2011

slide-2
SLIDE 2

11/16/11 Eventually Consistent HTTP with Statebox and Riak 2/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Introduction

This talk isn't really about web. It's about how we model data for the web. HTTP itself is not the interesting part of our systems. Our systems are mostly JSON over HTTP at the network boundary, nothing too clever!

slide-3
SLIDE 3

11/16/11 Eventually Consistent HTTP with Statebox and Riak 3/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Mochi's Business

We provide platforms for Flash game developers Ads, analytics, virtual currency, social, scores, etc. Terabytes of data to report on

slide-4
SLIDE 4

11/16/11 Eventually Consistent HTTP with Statebox and Riak 4/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Just a few years ago…

Millions of tuples was big Scale up vertically Single master SQL databases (Still works great for most companies)

slide-5
SLIDE 5

11/16/11 Eventually Consistent HTTP with Statebox and Riak 5/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Why was this easy?

ACID is cheap on a single node Efficient to establish a total ordering for events Single node systems do not have network partitions! Most businesses can probably still get away with this

slide-6
SLIDE 6

11/16/11 Eventually Consistent HTTP with Statebox and Riak 6/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Why does this break?

Availability is important (we're global!) Too expensive to scale vertically Schema evolution is hard Sharding not always possible, and rarely fun

slide-7
SLIDE 7

11/16/11 Eventually Consistent HTTP with Statebox and Riak 7/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Case Study: Friendwad

A social graph aggregator MochiGames, Facebook, Twitter, Myspace (oops!) Original implementation built on Mnesia Mnesia causes us pain

slide-8
SLIDE 8

11/16/11 Eventually Consistent HTTP with Statebox and Riak 8/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Friendwad Data Model

Twitter-like social digraph Each user has a unique id following: user ids that this user follows followers: user ids that follow this user

slide-9
SLIDE 9

11/16/11 Eventually Consistent HTTP with Statebox and Riak 9/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Friendwad Diagram

Alice Bob Following Alice Bob Followers

slide-10
SLIDE 10

11/16/11 Eventually Consistent HTTP with Statebox and Riak 10/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Mnesia Implementation

Table in mnesia for user records (id, following, followers) Multi-row transaction for each graph change At least two rows in each transaction, possibly more (third-party import)

slide-11
SLIDE 11

11/16/11 Eventually Consistent HTTP with Statebox and Riak 11/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Why not Mnesia?

Mnesia issues beyond the scope of this talk :) Anyway, we decided to migrate to Riak

slide-12
SLIDE 12

11/16/11 Eventually Consistent HTTP with Statebox and Riak 12/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Riak

Great solution for many of our data problems (thanks Basho!) Distributed eventually consistent key-value store But not a complete solution

slide-13
SLIDE 13

11/16/11 Eventually Consistent HTTP with Statebox and Riak 13/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Riak Migration

The simplest thing that could possibly work (incorrectly) … appears correct with serialized glasses Riak not transactional even for changes to a single row

slide-14
SLIDE 14

11/16/11 Eventually Consistent HTTP with Statebox and Riak 14/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Riak Migration Continued

slide-15
SLIDE 15

11/16/11 Eventually Consistent HTTP with Statebox and Riak 15/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Eventual Inconsistency

Popular user claimed they were missing entries in "followers" Verified that they were missing by looking at our analytics built from our transaction logs Especially non-transactional with allow_mult=false! My face probably still has a palm-shaped dent

slide-16
SLIDE 16

11/16/11 Eventually Consistent HTTP with Statebox and Riak 16/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Version Terminology

Client a reads version o (original state) Transform from client a on o produces version ao Think function application a(o())

slide-17
SLIDE 17

11/16/11 Eventually Consistent HTTP with Statebox and Riak 17/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Adding a friend [1]

Original state o for alice, bob on read id | alice | bob | followers | [] | [] | following | [] | [] | version | o | o |

slide-18
SLIDE 18

11/16/11 Eventually Consistent HTTP with Statebox and Riak 18/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Adding a friend [2]

Write modified bob at version ao id | alice | bob | followers | [] | [] | following | [] | [alice] | version | o | ao |

slide-19
SLIDE 19

11/16/11 Eventually Consistent HTTP with Statebox and Riak 19/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Adding a friend [3]

Write modified alice at version ao id | alice | bob | followers | [bob] | [] | following | [] | [alice] | version | ao | ao |

slide-20
SLIDE 20

11/16/11 Eventually Consistent HTTP with Statebox and Riak 20/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Interleaving for Fail

To simulate failure we need multiple concurrent operations a is alice ! bob b is bob ! carol

slide-21
SLIDE 21

11/16/11 Eventually Consistent HTTP with Statebox and Riak 21/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Concurrency Pains [1]

alice ! bob (a) initial state id | alice | bob | carol | followers | [] | [] | [] | following | [] | [] | [] | version | o | o | o |

slide-22
SLIDE 22

11/16/11 Eventually Consistent HTTP with Statebox and Riak 22/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Concurrency Pains [2]

bob ! carol (b) initial state (all look same!) id | alice | bob | carol | followers | [] | [] | [] | following | [] | [] | [] | version | o | o | o |

slide-23
SLIDE 23

11/16/11 Eventually Consistent HTTP with Statebox and Riak 23/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Concurrency Pains [3]

bob ! carol writes to bob id | alice | bob | carol | followers | [] | [] | [] | following | [] | [carol] | [] | version | o | bo | o |

slide-24
SLIDE 24

11/16/11 Eventually Consistent HTTP with Statebox and Riak 24/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Concurrency Pains [4]

alice ! bob writes to alice id | alice | bob | carol | followers | [] | [] | [] | following | [bob] | [carol] | [] | version | ao | bo | o |

slide-25
SLIDE 25

11/16/11 Eventually Consistent HTTP with Statebox and Riak 25/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Concurrency Pains [5]

alice ! bob writes to bob id | alice | bob | carol | followers | [] | [alice] | [] | following | [bob] | [] | [] | version | ao | ao | o |

slide-26
SLIDE 26

11/16/11 Eventually Consistent HTTP with Statebox and Riak 26/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Concurrency Pains [6]

bob ! carol writes to carol id | alice | bob | carol | followers | [] | [alice] | [bob] | following | [bob] | [] | [] | version | ao | ao | bo |

slide-27
SLIDE 27

11/16/11 Eventually Consistent HTTP with Statebox and Riak 27/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

FAIL

Concurrency ruins everything. W W W W W W W '. W .-""-._ \ \.--| / "-..__) .-' | _ / \'-.__, .__.,' `'----'._\--' VVVVVVVVVVVVVVVVVVVVV

slide-28
SLIDE 28

11/16/11 Eventually Consistent HTTP with Statebox and Riak 28/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Sibling Rivalry

If allow_mult is on, the next read of bob will have two siblings ([ao, bo]) because they descend from the same vector clock. Default strategy is "last write wins", also known as pain

slide-29
SLIDE 29

11/16/11 Eventually Consistent HTTP with Statebox and Riak 29/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Simple fix?

Merging ao and bo is easy! Just union over followers and following.

slide-30
SLIDE 30

11/16/11 Eventually Consistent HTTP with Statebox and Riak 30/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Simple fix? NOPE!

But edges are not insert-only! That ruins everything. It's better, but any inconsistency is just pain waiting to happen.

slide-31
SLIDE 31

11/16/11 Eventually Consistent HTTP with Statebox and Riak 31/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Fix all of the things

Turn on allow_mult=true Implemented statebox in anger to solve the rest of the problem

slide-32
SLIDE 32

11/16/11 Eventually Consistent HTTP with Statebox and Riak 32/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Design Philosophy

Adding code to Riak should be avoided (maintenance) The only option is to resolve conflicts on read Growth should be bounded and configurable Doesn't need to be language agnostic Minimize magic

slide-33
SLIDE 33

11/16/11 Eventually Consistent HTTP with Statebox and Riak 33/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

What's Statebox?

Opaque container Serializes current state With recent operations Provides merge operation Monad-like (not important)

slide-34
SLIDE 34

11/16/11 Eventually Consistent HTTP with Statebox and Riak 34/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Terminology

  • p() :: N-ary function reference plus N-1 arguments

event() :: {timestamp(), op()} {fun"ordsets:add_element/2,"[kitten]}

slide-35
SLIDE 35

11/16/11 Eventually Consistent HTTP with Statebox and Riak 35/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Internals

Designed to be used with Erlang's external term format (term_to_binary) Serializes function references, so is bound to exported code Prototyped in friendwad, but immediately extracted Open sourced because I couldn't find anything else like it

slide-36
SLIDE 36

11/16/11 Eventually Consistent HTTP with Statebox and Riak 36/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Theory

Statebox algorithm can be used as-is with any eventually consistent KV store Similar to paper on CRDT (Convergent / Commutative Replicated Data Types) Stores current value plus a (configurably) bounded event queue Event queue is bound by length and can expire events by age

slide-37
SLIDE 37

11/16/11 Eventually Consistent HTTP with Statebox and Riak 37/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Declarative (ordsets)

Add#=#fun#ordsets:add_element/2, Empty#=#statebox:new(fun#()#;>#[]#end), A#=#statebox:modify({Add,#[a]},#Empty), B#=#statebox:modify({Add,#[b]},#Empty), AB#=#statebox:merge([A,#B]), statebox:value(AB)#=:=#[a,#b].

slide-38
SLIDE 38

11/16/11 Eventually Consistent HTTP with Statebox and Riak 38/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Composable

Empty&=&statebox_orddict:from_values([]), Union&=&fun&statebox_orddict:f_union/2, A&=&statebox:modify([Union(following,&[b]), &&&&&&&&&&&&&&&&&&&&&Union(followers,&[c])], &&&&&&&&&&&&&&&&&&&&Empty), B&=&statebox:modify([Union(following,&[b]), &&&&&&&&&&&&&&&&&&&&&Union(followers,&[d])], &&&&&&&&&&&&&&&&&&&&Empty), AB&=&statebox:merge([A,&B]), statebox:value(AB)&=:=&[{followers,&[c,&d]}, &&&&&&&&&&&&&&&&&&&&&&&&{following,&[b]}].

slide-39
SLIDE 39

11/16/11 Eventually Consistent HTTP with Statebox and Riak 39/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Example [1]

A"""""::"[kitten] [{1,"Union([kitten])}]

slide-40
SLIDE 40

11/16/11 Eventually Consistent HTTP with Statebox and Riak 40/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Example [2]

A"""""::"[kitten] [{1,"Union([kitten])}] " B"""""::"[puppy] [{2,"Union([puppy])}]

slide-41
SLIDE 41

11/16/11 Eventually Consistent HTTP with Statebox and Riak 41/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Example [3]

A"""""::"[kitten] [{1,"Union([kitten])}] " B"""""::"[puppy] [{2,"Union([puppy])}] " [A,B]"::"[kitten,"puppy] [{1,"Union([kitten])}, "{2,"Union([puppy])}]

slide-42
SLIDE 42

11/16/11 Eventually Consistent HTTP with Statebox and Riak 42/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Merge

B is newer, so use its value as the basis Merge sort event queues Apply ops in order from the beginning

slide-43
SLIDE 43

11/16/11 Eventually Consistent HTTP with Statebox and Riak 43/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Merge [1]

Use B's value (arbitrarily newest) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!![puppy] ! Value!=![puppy]

slide-44
SLIDE 44

11/16/11 Eventually Consistent HTTP with Statebox and Riak 44/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Merge [2]

Apply ops oldest to newest (T=1) !!!!!!!!!!!!!!!union([kitten],![puppy]) ! Value!=![kitten,!puppy]

slide-45
SLIDE 45

11/16/11 Eventually Consistent HTTP with Statebox and Riak 45/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Statebox Merge [3]

Apply ops oldest to newest (T=2) union([puppy],+union([kitten],+[puppy])) + Value+=+[kitten,+puppy]

slide-46
SLIDE 46

11/16/11 Eventually Consistent HTTP with Statebox and Riak 46/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

statebox_riak wrapper

%%"bob"→"alice,"bob"→"carol S"="statebox_riak:new([{riakc_pb_socket,"P}, """""""""""""""""""""""{expire_ms,"5000}, """""""""""""""""""""""{max_queue,"50}]), Union"="fun"statebox_orddict:f_union/2, statebox_riak:apply_bucket_ops( """"<<"users">>, """"[{[<<"alice">>,"<<"carol">>], """"""Union(followers,"[bob])}, """""{[<<"bob">>], """"""Union(following,"[alice,"carol])}], """"S).

slide-47
SLIDE 47

11/16/11 Eventually Consistent HTTP with Statebox and Riak 47/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Restrictions

Operations must be repeatable (idempotent unary operation) Repeatable if and only if F(V) = F(F(V)) Old operations in the queue are replayed in-order on merge, but seeded with newer data

slide-48
SLIDE 48

11/16/11 Eventually Consistent HTTP with Statebox and Riak 48/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Repeatable Operations

Most set operations Most dictionary operations NOT most list operations (ordered lists may be ok!) NOT most integer operations

slide-49
SLIDE 49

11/16/11 Eventually Consistent HTTP with Statebox and Riak 49/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Non-repeatable ops?

Many can be transformed to repeatable operations statebox_counter is one example

slide-50
SLIDE 50

11/16/11 Eventually Consistent HTTP with Statebox and Riak 50/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

statebox_counter

Represent a counter as an ordered list of events [{{Timestamp, Ident}, Delta}] Ident is just a unique-ish identifier (node counter, random number, etc.) Well tested proof of concept, but not in production use

slide-51
SLIDE 51

11/16/11 Eventually Consistent HTTP with Statebox and Riak 51/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

counter optimizations

Prevent unbounded growth by coalescing old events into a single event with a fixed Ident Events older than this are ignored

slide-52
SLIDE 52

11/16/11 Eventually Consistent HTTP with Statebox and Riak 52/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Other statebox usage

achievements scorewad (via recordset)

slide-53
SLIDE 53

11/16/11 Eventually Consistent HTTP with Statebox and Riak 53/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

achievements

Manages achievements earned in games

  • rddict of {Achievement, Timestamp}

Stores to two keys: User, User_Game

slide-54
SLIDE 54

11/16/11 Eventually Consistent HTTP with Statebox and Riak 54/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

achievements orddict

Store oldest entry for achievement. f_store_min(Key,/New)/3> ////{fun/?MODULE:op_store_min/3,/[Key,/New]}. /

  • p_store_min(Key,/New,/D)/3>

////orddict:update( ////////Key, ////////fun/(Old)/3>/min(Old,/New)/end, ////////New, ////////D).

slide-55
SLIDE 55

11/16/11 Eventually Consistent HTTP with Statebox and Riak 55/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

scorewad

Manages high score boards for > 15,000 games Keeps top 50 scores per game for day, week, month, all time Also stores scores per user for social leaderboards Built recordset to migrate some of this to riak + statebox

slide-56
SLIDE 56

11/16/11 Eventually Consistent HTTP with Statebox and Riak 56/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

recordset

An optionally fixed-size ordered set of complex terms. User defined identity User defined sorting Optional and efficient fixed-sizedness

slide-57
SLIDE 57

11/16/11 Eventually Consistent HTTP with Statebox and Riak 57/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

recordset example (trivial)

Empty&=&recordset:new(fun&erlang:'=:='/2, &&&&&&&&&&&&&&&&&&&&&&fun&erlang:'<'/2, &&&&&&&&&&&&&&&&&&&&&&[{max_size,&2}]), Full&=&lists:foldl(fun&recordset:add/2, &&&&&&&&&&&&&&&&&&&Empty, &&&&&&&&&&&&&&&&&&&lists:seq(300,&400)), [399,&400]&=:=&recordset:to_list(Full).

slide-58
SLIDE 58

11/16/11 Eventually Consistent HTTP with Statebox and Riak 58/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

What's next?

statebox already does what we want it to More helper modules or projects will be added over time

slide-59
SLIDE 59

11/16/11 Eventually Consistent HTTP with Statebox and Riak 59/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Better than Statebox?

We'd all be better off if this kind of data structure was built-in to the database Higher level APIs! KV is fine but I want more from my database Redis-like features, but concurrent and multi-node

slide-60
SLIDE 60

11/16/11 Eventually Consistent HTTP with Statebox and Riak 60/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Why Riak could do it better

Simple clients: DB can reconcile state before return Efficiency: Can store less data (ring state, forced serialization, vclocks)

slide-61
SLIDE 61

11/16/11 Eventually Consistent HTTP with Statebox and Riak 61/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html

Questions?

Twitter: @etrepum Mochi Media: www.mochimedia.com Slides: etrepum.github.com/statebox_qconsf_2011 git.io/statebox git.io/statebox_riak git.io/recordset

slide-62
SLIDE 62

11/16/11 Eventually Consistent HTTP with Statebox and Riak 62/62 file://localhost/Users/bob/src/mochi/bob/statebox_qconsf_2011-20416/slides.html