Clustering in Go May 2016 Wilfried Schobeiri MediaMath - - PowerPoint PPT Presentation

clustering in go
SMART_READER_LITE
LIVE PREVIEW

Clustering in Go May 2016 Wilfried Schobeiri MediaMath - - PowerPoint PPT Presentation

5/26/2016 Clustering in Go Clustering in Go May 2016 Wilfried Schobeiri MediaMath http://127.0.0.1:3999/clustering-in-go.slide#1 1/42 5/26/2016 Clustering in Go Who am I? Go enthusiast These days, mostly codes for fun Focused on


slide-1
SLIDE 1

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 1/42

Clustering in Go

May 2016

Wilfried Schobeiri MediaMath

slide-2
SLIDE 2

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 2/42

Who am I?

Go enthusiast These days, mostly codes for fun Focused on Infrastruture & Platform @ MediaMath (http://careers.mediamath.com) We're hiring!

slide-3
SLIDE 3

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 3/42

Why Go?

Easy to build services Great stdlib Lot's of community libraries & utilities Great built-in tooling (like go fmt, test, vet, -race, etc) Compiles as fast as a scripting language Just "feels" productive (This is not a language pitch talk)

slide-4
SLIDE 4

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 4/42

Why "Clustering in Go"?

slide-5
SLIDE 5

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 5/42

Why "Clustering in Go"?

Clustering is not batteries included in Golang Lots of newer libraries, none very mature More often not, services roll it themselves So, here's one way of building a clustered, stateful service in Go.

slide-6
SLIDE 6

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 6/42

And now, for a (fake-ish) scenario

Multiple datacenters Separated by thousands of miles each (eg, ORD - HKG - AMS), With many events happening concurrently at each one. We want to count them.

slide-7
SLIDE 7

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 7/42

With some constraints:

Counting should be fast, we can't aord to cross the ocean every time Counts should be correct (please don't lose my events) Starting to look like an AP system, right?

slide-8
SLIDE 8

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 8/42

Let's get started

First, a basic counter service One node Counter = Atomic Int Nothin Fancy

$ curl http://localhost:4000/ $ curl http://localhost:4000/inc?amount=1 1

slide-9
SLIDE 9

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 9/42

A basic Counter Service

type Counter struct { val int32 } // IncVal increments the counter's value by d func (c *Counter) IncVal(d int) { atomic.AddInt32(&c.val, int32(d)) } // Count fetches the counter value func (c *Counter) Count() int { return int(atomic.LoadInt32(&c.val)) }

Run

slide-10
SLIDE 10

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 10/42

Ok, let's geodistribute it.

slide-11
SLIDE 11

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 11/42

Ok, let's geodistribute it.

A node (or several) in each datacenter Route increment requests to the closest node Let's stand one up in each.

slide-12
SLIDE 12

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 12/42

Demo

slide-13
SLIDE 13

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 13/42

Duh, we're not replicating state!

We need the counters to talk to each other Which means we need the nodes to know about each other Which means we need to to solve for cluster membership Enter, the memberlist (https://github.com/hashicorp/memberlist) package

slide-14
SLIDE 14

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 14/42

Memberlist

A Go library that manages cluster membership Based on SWIM (https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf) , a gossip-style membership protocol Has baked in member failure detection Used by Consul, Docker, libnetwork, many more

slide-15
SLIDE 15

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 15/42

About SWIM

"Scalable Weakly-consistent Infection-style Process Group Membership Protocol" Two goals: Maintain a local membership list of non-faulty processes Detect and eventually notify others of process failures

slide-16
SLIDE 16

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 16/42

SWIM mechanics

Gossip-based On join, a new node does a full state sync with an existing member, and begins gossipping its existence to the cluster Gossip about memberlist state happens on a regular interval and against number

  • f randomly selected members

If a node doesn't ack a probe message, it is marked "suspicious" If a suspicious node doesn't dispute suspicion after a timeout, it's marked dead Every so often, a full state sync is done between random members (expensive!) Tradeos between bandwidth and convergence time are congurable More details about SWIM can be found here (https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf) and here

(http://prakhar.me/articles/swim/) .

slide-17
SLIDE 17

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 17/42

Becoming Cluster-aware via Memberlist

members = flag.String("members", "", "comma seperated list of members") ... c := memberlist.DefaultWANConfig() m, err := memberlist.Create(c) if err != nil { return err } //Join other members if specified, otherwise start a new cluster if len(*members) > 0 { members_each := strings.Split(*members, ",") _, err := m.Join(members_each) if err != nil { return err } }

slide-18
SLIDE 18

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 18/42

Demo

slide-19
SLIDE 19

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 19/42

CRDTs to the rescue!

slide-20
SLIDE 20

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 20/42

CRDTs, simplied

CRDT = Conict-Free Replicated Data Types Counters, Sets, Maps, Flags, et al Operations within the type must be associative, commutative, and idempotent Order-free Therefore, very easy to handle failure scenarios: just retry the merge! CRDTs are by nature eventually consistent, because there is no single source of truth. Some notes can be found here (http://hal.upmc.fr/inria-00555588/document) and here (https://github.com/pfraze/crdt_notes) (among many others!).

slide-21
SLIDE 21

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 21/42

G-Counter

Perhaps one of the most basic CRDTs: A counter with only two ops: increment and merge (no decrement!) Each node manages its own count Nodes communicate their counter state with other nodes Merges take the max() count for each node G-Counter's Value is the sum of all node count values

slide-22
SLIDE 22

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 22/42

G-counter

slide-23
SLIDE 23

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 23/42

slide-24
SLIDE 24

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 24/42

G-counter

5

slide-25
SLIDE 25

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 25/42

slide-26
SLIDE 26

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 26/42

G-counter

slide-27
SLIDE 27

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 27/42

slide-28
SLIDE 28

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 28/42

type GCounter struct { // ident provides a unique identity to each replica. ident string // counter maps identity of each replica to their counter values counter map[string]int } func (g *GCounter) IncVal(incr int) { g.counter[g.ident] += incr } func (g *GCounter) Count() (total int) { for _, val := range g.counter { total += val } return } func (g *GCounter) Merge(c *GCounter) { for ident, val := range c.counter { if v, ok := g.counter[ident]; !ok || v < val { g.counter[ident] = val } } }

slide-29
SLIDE 29

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 29/42

slide-30
SLIDE 30

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 30/42

Demo

slide-31
SLIDE 31

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 31/42

Let's Merge State!

slide-32
SLIDE 32

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 32/42

Merging State via Memberlist

[DEBUG] memberlist: Initiating push/pull sync with: 127.0.0.1:61300

Memberlist does a "push/pull" to do a complete state exchange with another random member We can piggyback this state exchange via the Delegate interface: LocalState() and MergeRemoteState() Push/pull interval is congurable Happens over TCP Let's use it to eventually merge state in the background.

slide-33
SLIDE 33

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 33/42

Merging State via Memberlist

// Share the local counter state via MemberList to another node func (d *delegate) LocalState(join bool) []byte { b, err := counter.MarshalJSON() if err != nil { panic(err) } return b } // Merge a received counter state func (d *delegate) MergeRemoteState(buf []byte, join bool) { if len(buf) == 0 { return } externalCRDT := crdt.NewGCounterFromJSON(buf) counter.Merge(externalCRDT) }

slide-34
SLIDE 34

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 34/42

Demo

slide-35
SLIDE 35

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 35/42

Success! (Eventually)

slide-36
SLIDE 36

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 36/42

Want so sync faster?

It's possible to broadcast to all member nodes, via Memberlist's QueueBroadcast() and NotifyMsg().

func BroadcastState() { ... broadcasts.QueueBroadcast(&broadcast{ msg: b, }) } // NotifyMsg is invoked upon receipt of message func (d *delegate) NotifyMsg(b []byte) { ... switch update.Action { case "merge": externalCRDT := crdt.NewGCounterFromJSONBytes(update.Data) counter.Merge(externalCRDT) } ... }

Faster sync for more bandwidth. Still eventually consistent.

slide-37
SLIDE 37

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 37/42

Demo

slide-38
SLIDE 38

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 38/42

Next:

No tests? For shame! Implement persistence and time windowing We probably want more than one node per datacenter Jepsen all the things Implement a real RPC layer instead of Memberlist's delegate for ner performance and authn/z control Run it as a unikernel within docker running inside a VM in the cloud Sprinkle some devops magic dust on it Achieve peak microservice

slide-39
SLIDE 39

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 39/42

TL;DR:

It's not (that) hard to build a clustered service in go.

slide-40
SLIDE 40

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 40/42

Fin!

Questions? Slides and code can be found at github.com/nphase/go-clustering-example

(github.com/nphase/go-clustering-example)

MediaMath is hiring! Thanks!

slide-41
SLIDE 41

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 41/42

Thank you

Wilfried Schobeiri MediaMath @nphase (http://twitter.com/nphase)

slide-42
SLIDE 42

5/26/2016 Clustering in Go http://127.0.0.1:3999/clustering-in-go.slide#1 42/42