IMPLEMENTING RIAK IN ERLANG: BENEFITS AND CHALLENGES Steve Vinoski - - PowerPoint PPT Presentation

implementing riak in erlang benefits and challenges
SMART_READER_LITE
LIVE PREVIEW

IMPLEMENTING RIAK IN ERLANG: BENEFITS AND CHALLENGES Steve Vinoski - - PowerPoint PPT Presentation

IMPLEMENTING RIAK IN ERLANG: BENEFITS AND CHALLENGES Steve Vinoski Basho Technologies Cambridge, MA USA http://basho.com @stevevinoski vinoski@ieee.org http://steve.vinoski.net/ Wednesday, April 24, 13 1 ERLANG Wednesday, April 24, 13


slide-1
SLIDE 1

IMPLEMENTING RIAK IN ERLANG: BENEFITS AND CHALLENGES

Steve Vinoski

Basho Technologies

Cambridge, MA USA http://basho.com @stevevinoski

vinoski@ieee.org http://steve.vinoski.net/

1 Wednesday, April 24, 13

slide-2
SLIDE 2

ERLANG

2 Wednesday, April 24, 13

slide-3
SLIDE 3

Ericsson Telecom Switch Requirements

3 Wednesday, April 24, 13

slide-4
SLIDE 4
  • Large number of concurrent activities

Ericsson Telecom Switch Requirements

3 Wednesday, April 24, 13

slide-5
SLIDE 5
  • Large number of concurrent activities
  • Large software systems distributed across multiple

computers

Ericsson Telecom Switch Requirements

3 Wednesday, April 24, 13

slide-6
SLIDE 6
  • Large number of concurrent activities
  • Large software systems distributed across multiple

computers

  • Continuous operation for years

Ericsson Telecom Switch Requirements

3 Wednesday, April 24, 13

slide-7
SLIDE 7
  • Large number of concurrent activities
  • Large software systems distributed across multiple

computers

  • Continuous operation for years
  • Live updates and maintenance

Ericsson Telecom Switch Requirements

3 Wednesday, April 24, 13

slide-8
SLIDE 8
  • Large number of concurrent activities
  • Large software systems distributed across multiple

computers

  • Continuous operation for years
  • Live updates and maintenance
  • Tolerance for both hardware and software faults

Ericsson Telecom Switch Requirements

3 Wednesday, April 24, 13

slide-9
SLIDE 9
  • Large number of concurrent activities
  • Large software systems distributed across multiple

computers

  • Continuous operation for years
  • Live updates and maintenance
  • Tolerance for both hardware and software faults

Today’s Data/Web/Cloud/ Service Apps

4 Wednesday, April 24, 13

slide-10
SLIDE 10

CONCURRENCY

5 Wednesday, April 24, 13

slide-11
SLIDE 11
  • Erlang processes are very lightweight, much lighter than

OS threads

  • Hundreds of thousands or even millions of processes

per Erlang VM instance

They Come For The Concurrency...

6 Wednesday, April 24, 13

slide-12
SLIDE 12

...But They Stay For The Reliability

7 Wednesday, April 24, 13

slide-13
SLIDE 13
  • Isolation: Erlang processes communicate only via

message passing

...But They Stay For The Reliability

7 Wednesday, April 24, 13

slide-14
SLIDE 14
  • Isolation: Erlang processes communicate only via

message passing

  • Distribution: Erlang process model works across nodes

...But They Stay For The Reliability

7 Wednesday, April 24, 13

slide-15
SLIDE 15
  • Isolation: Erlang processes communicate only via

message passing

  • Distribution: Erlang process model works across nodes
  • Linking/supervision/monitoring: allow an Erlang

process to take action when another fails

...But They Stay For The Reliability

7 Wednesday, April 24, 13

slide-16
SLIDE 16

Erlang Process Architecture

8 Wednesday, April 24, 13

slide-17
SLIDE 17

CPU Core 1

. . . . . .

CPU Core N

Erlang Process Architecture

8 Wednesday, April 24, 13

slide-18
SLIDE 18

OS + kernel threads

CPU Core 1

. . . . . .

CPU Core N

Erlang Process Architecture

8 Wednesday, April 24, 13

slide-19
SLIDE 19

OS + kernel threads

CPU Core 1

. . . . . .

CPU Core N

Erlang VM

N 1

SMP Schedulers (one per core)

Erlang Process Architecture

8 Wednesday, April 24, 13

slide-20
SLIDE 20

Run Queues OS + kernel threads

CPU Core 1

. . . . . .

CPU Core N

Erlang VM

N 1

SMP Schedulers (one per core)

Erlang Process Architecture

8 Wednesday, April 24, 13

slide-21
SLIDE 21

Run Queues

Process Process Process Process Process Process

OS + kernel threads

CPU Core 1

. . . . . .

CPU Core N

Erlang VM

N 1

SMP Schedulers (one per core)

Erlang Process Architecture

8 Wednesday, April 24, 13

slide-22
SLIDE 22

A Small Language

9 Wednesday, April 24, 13

slide-23
SLIDE 23

A Small Language

  • Erlang has just a few elements: numbers, atoms, tuples,

lists, records, binaries, functions, modules

9 Wednesday, April 24, 13

slide-24
SLIDE 24

A Small Language

  • Erlang has just a few elements: numbers, atoms, tuples,

lists, records, binaries, functions, modules

  • Variables are immutable, no globals

9 Wednesday, April 24, 13

slide-25
SLIDE 25

A Small Language

  • Erlang has just a few elements: numbers, atoms, tuples,

lists, records, binaries, functions, modules

  • Variables are immutable, no globals
  • Flow control via pattern matching, case, if, try-catch,

recursion, messages

9 Wednesday, April 24, 13

slide-26
SLIDE 26

Concurrency Primitives

  • No mutexes, condition variables, or other error-prone

concurrency constructs

  • All Erlang code runs within some process, always
  • processes are not “extra” like threads in other

languages

10 Wednesday, April 24, 13

slide-27
SLIDE 27
  • spawn: create a new Erlang process
  • ! (exclamation point) or send: send a message to another

Erlang process, even on another node

  • Messages can be any Erlang term
  • Messages from A to B arrive in the order sent

Concurrency Primitives

Pid1 ! ok, Pid2 ! [{first, "John"},{last,"Doe"}].

11 Wednesday, April 24, 13

slide-28
SLIDE 28
  • Each process has a message queue
  • receive: receive a message from another Erlang process
  • Selective receive allows receiving specific messages from anywhere

within the message queue

Concurrency Primitives

receive {ok, Reply} -> do_something(Reply); {error, Error} -> uh_oh(Error) end.

12 Wednesday, April 24, 13

slide-29
SLIDE 29

Erlang Immutability

  • Erlang assignment is pattern matching, not mutation
  • Unbound variables get the value of the right-hand side

and then can't be changed

13 Wednesday, April 24, 13

slide-30
SLIDE 30

Erlang Immutability

14 Wednesday, April 24, 13

slide-31
SLIDE 31

Erlang Immutability

15 Wednesday, April 24, 13

slide-32
SLIDE 32

Erlang Immutability

16 Wednesday, April 24, 13

slide-33
SLIDE 33

Erlang Immutability

17 Wednesday, April 24, 13

slide-34
SLIDE 34

Easy To Learn

  • Language size means developers become proficient

quickly

  • Code is typically brief, easy to read, easy to understand
  • Erlang's Open Telecom Platform (OTP) frameworks solve

recurring problems across multiple domains

18 Wednesday, April 24, 13

slide-35
SLIDE 35

RIAK

19 Wednesday, April 24, 13

slide-36
SLIDE 36

Riak

20 Wednesday, April 24, 13

slide-37
SLIDE 37

Riak

  • A distributed

20 Wednesday, April 24, 13

slide-38
SLIDE 38

Riak

  • A distributed highly available

20 Wednesday, April 24, 13

slide-39
SLIDE 39

Riak

  • A distributed highly available eventually consistent

20 Wednesday, April 24, 13

slide-40
SLIDE 40

Riak

  • A distributed highly available eventually consistent

highly scalable

20 Wednesday, April 24, 13

slide-41
SLIDE 41

Riak

  • A distributed highly available eventually consistent

highly scalable open source

20 Wednesday, April 24, 13

slide-42
SLIDE 42

Riak

  • A distributed highly available eventually consistent

highly scalable open source key-value database

20 Wednesday, April 24, 13

slide-43
SLIDE 43

Riak

  • A distributed highly available eventually consistent

highly scalable open source key-value database written primarily in Erlang.

20 Wednesday, April 24, 13

slide-44
SLIDE 44

Riak

  • Modeled after Amazon Dynamo
  • see Andy Gross's "Dynamo, Five Years Later" for details

https://speakerdeck.com/argv0/dynamo-five-years-later

  • Also provides MapReduce, secondary indexes, and full-

text search

  • Built for operational ease

21 Wednesday, April 24, 13

slide-45
SLIDE 45

Riak Architecture

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

22 Wednesday, April 24, 13

slide-46
SLIDE 46

Riak Architecture

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

23 Wednesday, April 24, 13

slide-47
SLIDE 47

Riak Architecture

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

24 Wednesday, April 24, 13

slide-48
SLIDE 48

Riak Architecture

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

25 Wednesday, April 24, 13

slide-49
SLIDE 49

Riak Architecture

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV

Erlang parts

image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

26 Wednesday, April 24, 13

slide-50
SLIDE 50

Riak Cluster

node 0

node 1 node 2

node 3

27 Wednesday, April 24, 13

slide-51
SLIDE 51

Distributing Data

  • Riak uses consistent hashing to spread

data across the cluster

  • Minimizes remapping of keys when

number of nodes changes

  • Spreads data evenly and minimizes

hotspots

node 0

node 1 node 2

node 3

28 Wednesday, April 24, 13

slide-52
SLIDE 52

Consistent Hashing

node 0

node 1 node 2

node 3

29 Wednesday, April 24, 13

slide-53
SLIDE 53

Consistent Hashing

  • Riak uses SHA-1 as a hash function

node 0

node 1 node 2

node 3

29 Wednesday, April 24, 13

slide-54
SLIDE 54

Consistent Hashing

  • Riak uses SHA-1 as a hash function
  • Treats its 160-bit value space as a ring

node 0

node 1 node 2

node 3

29 Wednesday, April 24, 13

slide-55
SLIDE 55

Consistent Hashing

  • Riak uses SHA-1 as a hash function
  • Treats its 160-bit value space as a ring
  • Divides the ring into partitions called "virtual

nodes" or vnodes (default 64)

node 0

node 1 node 2

node 3

29 Wednesday, April 24, 13

slide-56
SLIDE 56

Consistent Hashing

  • Riak uses SHA-1 as a hash function
  • Treats its 160-bit value space as a ring
  • Divides the ring into partitions called "virtual

nodes" or vnodes (default 64)

  • Each vnode claims a portion of the ring space

node 0

node 1 node 2

node 3

29 Wednesday, April 24, 13

slide-57
SLIDE 57

Consistent Hashing

  • Riak uses SHA-1 as a hash function
  • Treats its 160-bit value space as a ring
  • Divides the ring into partitions called "virtual

nodes" or vnodes (default 64)

  • Each vnode claims a portion of the ring space
  • Each physical node in the cluster hosts

multiple vnodes

node 0

node 1 node 2

node 3

29 Wednesday, April 24, 13

slide-58
SLIDE 58

Hash Ring

2160 2160/4 2160/2 3*2160/4

node 0

node 1 node 2

node 3

30 Wednesday, April 24, 13

slide-59
SLIDE 59

Hash Ring

node 0

node 1 node 2

node 3

31 Wednesday, April 24, 13

slide-60
SLIDE 60

Hash Ring

node 0

node 1 node 2

node 3 bucket key

31 Wednesday, April 24, 13

slide-61
SLIDE 61

N/R/W Values

32 Wednesday, April 24, 13

slide-62
SLIDE 62

N/R/W Values

  • N = number of replicas to store (default 3, can be set

per bucket)

32 Wednesday, April 24, 13

slide-63
SLIDE 63

N/R/W Values

  • N = number of replicas to store (default 3, can be set

per bucket)

  • R = read quorum = number of replica responses needed

for a successful read (can be specified per-request)

32 Wednesday, April 24, 13

slide-64
SLIDE 64

N/R/W Values

  • N = number of replicas to store (default 3, can be set

per bucket)

  • R = read quorum = number of replica responses needed

for a successful read (can be specified per-request)

  • W = write quorum = number of replica responses

needed for a successful write (can be specified per- request)

32 Wednesday, April 24, 13

slide-65
SLIDE 65

for details see http://docs.basho.com/riak/1.3.1/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/

node 0

node 1 node 2

node 3

N/R/W Values

33 Wednesday, April 24, 13

slide-66
SLIDE 66

N/R/W Values

34 Wednesday, April 24, 13

slide-67
SLIDE 67

Implementing Consistent Hashing

35 Wednesday, April 24, 13

slide-68
SLIDE 68

Implementing Consistent Hashing

  • Erlang's crypto module integration with OpenSSL

provides the SHA-1 function

35 Wednesday, April 24, 13

slide-69
SLIDE 69

Implementing Consistent Hashing

  • Erlang's crypto module integration with OpenSSL

provides the SHA-1 function

  • Hash values are 160 bits

35 Wednesday, April 24, 13

slide-70
SLIDE 70

Implementing Consistent Hashing

  • Erlang's crypto module integration with OpenSSL

provides the SHA-1 function

  • Hash values are 160 bits
  • But that's OK, Erlang's integers are infinite precision

35 Wednesday, April 24, 13

slide-71
SLIDE 71

Implementing Consistent Hashing

  • Erlang's crypto module integration with OpenSSL

provides the SHA-1 function

  • Hash values are 160 bits
  • But that's OK, Erlang's integers are infinite precision
  • And Erlang binaries store these large values effjciently

35 Wednesday, April 24, 13

slide-72
SLIDE 72

Implementing Consistent Hashing

36 Wednesday, April 24, 13

slide-73
SLIDE 73

Implementing Consistent Hashing

37 Wednesday, April 24, 13

slide-74
SLIDE 74

Implementing Consistent Hashing

38 Wednesday, April 24, 13

slide-75
SLIDE 75

Implementing Consistent Hashing

39 Wednesday, April 24, 13

slide-76
SLIDE 76

Implementing Consistent Hashing

40 Wednesday, April 24, 13

slide-77
SLIDE 77

Implementing Consistent Hashing

41 Wednesday, April 24, 13

slide-78
SLIDE 78

Riak's Ring

42 Wednesday, April 24, 13

slide-79
SLIDE 79

Riak's Ring

43 Wednesday, April 24, 13

slide-80
SLIDE 80

Riak's Ring

44 Wednesday, April 24, 13

slide-81
SLIDE 81

Riak's Ring

45 Wednesday, April 24, 13

slide-82
SLIDE 82

Riak's Ring

46 Wednesday, April 24, 13

slide-83
SLIDE 83

Ring State

  • All nodes in a Riak cluster are peers, no masters or

slaves

  • Nodes exchange their understanding of ring state via a

gossip protocol

47 Wednesday, April 24, 13

slide-84
SLIDE 84

Distributed Erlang

  • Erlang has distribution built in — it's required for

supporting multiple nodes for reliability

  • By default Erlang nodes form a mesh, every node knows

about every other node

  • Riak uses this for intra-cluster communication

48 Wednesday, April 24, 13

slide-85
SLIDE 85

Distributed Erlang

  • Riak lets you simulate a multi-node installment
  • n a single machine, nice for development
  • "make devrel" or "make stagedevrel" in a riak

repository clone (git://github.com/basho/riak.git)

  • Let's assume we have nodes dev1, dev2, and

dev3 running in a cluster, nothing on the 4th node yet

  • Instead of starting riak, let's start the 4th node

as just a plain distributed erlang node

node 0

node 1 node 2

node 3

49 Wednesday, April 24, 13

slide-86
SLIDE 86

Distributed Erlang

50 Wednesday, April 24, 13

slide-87
SLIDE 87

Distributed Erlang

51 Wednesday, April 24, 13

slide-88
SLIDE 88

Distributed Erlang

52 Wednesday, April 24, 13

slide-89
SLIDE 89

Distributed Erlang

53 Wednesday, April 24, 13

slide-90
SLIDE 90

Distributed Erlang

54 Wednesday, April 24, 13

slide-91
SLIDE 91

Distributed Erlang Mesh

node 0

node 1 node 2

node 3

55 Wednesday, April 24, 13

slide-92
SLIDE 92

Distributed Erlang Mesh

node 0

node 1 node 2

node 3

55 Wednesday, April 24, 13

slide-93
SLIDE 93

Distributed Erlang Mesh

node 0

node 1 node 2

node 3

  • Nodes talk to each other
  • ccasionally to check

liveness

55 Wednesday, April 24, 13

slide-94
SLIDE 94

Distributed Erlang Mesh

node 0

node 1 node 2

node 3

  • Nodes talk to each other
  • ccasionally to check

liveness

  • Mesh approach makes it

easy to set up a cluster

55 Wednesday, April 24, 13

slide-95
SLIDE 95

Distributed Erlang Mesh

node 0

node 1 node 2

node 3

  • Nodes talk to each other
  • ccasionally to check

liveness

  • Mesh approach makes it

easy to set up a cluster

  • But communication
  • verhead means it

doesn't scale to large clusters > 150 nodes (yet)

55 Wednesday, April 24, 13

slide-96
SLIDE 96

Gossip

  • Riak nodes are peers, there's no master
  • But the ring has state, such as what vnodes each node

has claimed

  • Nodes periodically send their understanding of the ring

state to other randomly chosen nodes

  • Riak gossip module also provides an API for sending

ring state to specific nodes

56 Wednesday, April 24, 13

slide-97
SLIDE 97

Control Vs. Data

57 Wednesday, April 24, 13

slide-98
SLIDE 98

Control Vs. Data

  • Distributed Erlang: good for control plane, not so good

for data plane

57 Wednesday, April 24, 13

slide-99
SLIDE 99

Control Vs. Data

  • Distributed Erlang: good for control plane, not so good

for data plane

  • Sending large data can cause busy distribution ports

and head-of-line blocking

57 Wednesday, April 24, 13

slide-100
SLIDE 100

Control Vs. Data

  • Distributed Erlang: good for control plane, not so good

for data plane

  • Sending large data can cause busy distribution ports

and head-of-line blocking

  • Use TCP, UDP, etc. directly for data plane traffjc

57 Wednesday, April 24, 13

slide-101
SLIDE 101

Control Vs. Data

  • Distributed Erlang: good for control plane, not so good

for data plane

  • Sending large data can cause busy distribution ports

and head-of-line blocking

  • Use TCP, UDP, etc. directly for data plane traffjc
  • Don't mix control plane and data plane traffjc
  • unfortunately Riak currently still does this in a few

places

57 Wednesday, April 24, 13

slide-102
SLIDE 102

Riak Core

Riak KV

Bitcask eLevelDB Memory Multi Riak API Riak Clients

58 Wednesday, April 24, 13

slide-103
SLIDE 103

Riak Core

Riak Core Riak KV

Bitcask eLevelDB Memory Multi Riak API Riak Clients

58 Wednesday, April 24, 13

slide-104
SLIDE 104

Riak Core

Riak Core Riak KV

Bitcask eLevelDB Memory Multi Riak API Riak Clients

  • consistent

hashing

  • vector clocks
  • sloppy quorums
  • gossip protocols
  • virtual nodes

(vnodes)

  • hinted handoff

58 Wednesday, April 24, 13

slide-105
SLIDE 105

N/R/W Values

59 Wednesday, April 24, 13

slide-106
SLIDE 106

Hinted Handofg

60 Wednesday, April 24, 13

slide-107
SLIDE 107

Hinted Handofg

  • Fallback vnode holds data for unavailable primary vnode

60 Wednesday, April 24, 13

slide-108
SLIDE 108

Hinted Handofg

  • Fallback vnode holds data for unavailable primary vnode
  • Fallback vnode keeps checking for availability of primary

vnode

60 Wednesday, April 24, 13

slide-109
SLIDE 109

Hinted Handofg

  • Fallback vnode holds data for unavailable primary vnode
  • Fallback vnode keeps checking for availability of primary

vnode

  • Once primary vnode becomes available, fallback hands
  • fg data to it

60 Wednesday, April 24, 13

slide-110
SLIDE 110

Hinted Handofg

  • Fallback vnode holds data for unavailable primary vnode
  • Fallback vnode keeps checking for availability of primary

vnode

  • Once primary vnode becomes available, fallback hands
  • fg data to it
  • Fallback vnodes are started as needed, thanks to Erlang

lightweight processes

60 Wednesday, April 24, 13

slide-111
SLIDE 111

Read Repair

  • If a read detects a vnode with stale data, it is repaired

via asynchronous update

  • Helps implement eventual consistency
  • Starting at version 1.3, Riak supports active anti-

entropy (AAE) to actively repair stale values

61 Wednesday, April 24, 13

slide-112
SLIDE 112

Core Protocols

  • Gossip, handofg, read repair, etc. all require intra-

cluster protocols

  • Erlang distribution and other features help significantly

with protocol implementations

  • Erlang monitors allow processes and nodes to watch

each other while interacting

  • A monitoring process/node is notified if a monitored

process/node dies, great for aborting failed interactions

62 Wednesday, April 24, 13

slide-113
SLIDE 113

Binary Handling

  • Erlang's binaries make working with network packets

easy

  • For example, deconstructing a TCP message (from

Cesarini & Thompson “Erlang Programming”)

source: http://en.wikipedia.org/wiki/Transmission_Control_Protocol

63 Wednesday, April 24, 13

slide-114
SLIDE 114

Binary Handling

64 Wednesday, April 24, 13

slide-115
SLIDE 115

TCP header fields

Binary Handling

65 Wednesday, April 24, 13

slide-116
SLIDE 116

TCP data payload

Binary Handling

66 Wednesday, April 24, 13

slide-117
SLIDE 117

Binary Handling

67 Wednesday, April 24, 13

slide-118
SLIDE 118
  • OTP provides libraries of standard modules
  • And also behaviors: implementations of common

patterns for concurrent, distributed, fault-tolerant Erlang apps

Protocols With OTP

68 Wednesday, April 24, 13

slide-119
SLIDE 119

OTP Behavior Modules

  • A behavior is similar to an abstract base class in OO

terms, providing:

  • a message handling tail-call optimized loop
  • integration with underlying OTP system for code

upgrade, tracing, process management, etc.

69 Wednesday, April 24, 13

slide-120
SLIDE 120

OTP Behaviors

70 Wednesday, April 24, 13

slide-121
SLIDE 121

OTP Behaviors

  • application: plugs into Erlang application controller

70 Wednesday, April 24, 13

slide-122
SLIDE 122

OTP Behaviors

  • application: plugs into Erlang application controller
  • supervisor: manages and monitors worker processes

70 Wednesday, April 24, 13

slide-123
SLIDE 123

OTP Behaviors

  • application: plugs into Erlang application controller
  • supervisor: manages and monitors worker processes
  • gen_server: server process framework

70 Wednesday, April 24, 13

slide-124
SLIDE 124

OTP Behaviors

  • application: plugs into Erlang application controller
  • supervisor: manages and monitors worker processes
  • gen_server: server process framework
  • gen_fsm: finite state machine framework

70 Wednesday, April 24, 13

slide-125
SLIDE 125

OTP Behaviors

  • application: plugs into Erlang application controller
  • supervisor: manages and monitors worker processes
  • gen_server: server process framework
  • gen_fsm: finite state machine framework
  • gen_event: event handling framework

70 Wednesday, April 24, 13

slide-126
SLIDE 126

Gen_server

  • Generic server behavior for handling messages
  • Supports server-like components, distributed or not
  • “Business logic” lives in app-specific callback module
  • Maintains state in a tail-call optimized receive loop

71 Wednesday, April 24, 13

slide-127
SLIDE 127

Gen_fsm

  • Behavior supporting finite state machines (FSMs)
  • Tail-call loop for maintaining state, like gen_server
  • States and events handled by app-specific callback

module

  • Allows events to be sent into an FSM either sync or

async

72 Wednesday, April 24, 13

slide-128
SLIDE 128

Riak And Gen_*

  • Riak makes heavy use of these behaviors, e.g.:
  • FSMs for get and put operations
  • Vnode FSM
  • Gossip module is a gen_server

73 Wednesday, April 24, 13

slide-129
SLIDE 129

Behavior Benefits

  • Standardized frameworks providing common patterns,

common vocabulary

  • Used by pretty much all non-trivial Erlang systems
  • Erlang developers understand them, know how to read

them

74 Wednesday, April 24, 13

slide-130
SLIDE 130

Behavior Benefits

  • Separate a lot of messaging, debugging, tracing

support, system concerns from business logic

OTP gen_* module App callback module incoming messages

  • utgoing

messages callback replies

system application

75 Wednesday, April 24, 13

slide-131
SLIDE 131

Workers & Supervisors

  • Workers implement application logic
  • Supervisors:
  • start child workers and sub-supervisors
  • link to the children and trap child process exits
  • take action when a child dies, typically restarting one
  • r more children

76 Wednesday, April 24, 13

slide-132
SLIDE 132

Let It Crash

  • In his doctoral thesis, Joe Armstrong, creator of Erlang,

wrote:

  • Let some other process do the error recovery.
  • If you can’t do what you want to do, die.
  • Let it crash.
  • Do not program defensively.

see http://www.erlang.org/download/armstrong_thesis_2003.pdf

77 Wednesday, April 24, 13

slide-133
SLIDE 133

Application, Supervisors, Workers

78 Wednesday, April 24, 13

slide-134
SLIDE 134

Application, Supervisors, Workers

Application

78 Wednesday, April 24, 13

slide-135
SLIDE 135

Application, Supervisors, Workers

Application Supervisors

78 Wednesday, April 24, 13

slide-136
SLIDE 136

Application, Supervisors, Workers

Application Workers Supervisors

78 Wednesday, April 24, 13

slide-137
SLIDE 137

Application, Supervisors, Workers

Application Workers Supervisors Simple Core

78 Wednesday, April 24, 13

slide-138
SLIDE 138

Erlang/OTP System Facilities

  • Get status of an OTP process
  • Get process info for any process
  • Trace function calls, messages
  • Releases
  • Live upgrades

79 Wednesday, April 24, 13

slide-139
SLIDE 139

INTEGRATION

80 Wednesday, April 24, 13

slide-140
SLIDE 140

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

Riak Architecture

81 Wednesday, April 24, 13

slide-141
SLIDE 141

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV

Riak Architecture

image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

82 Wednesday, April 24, 13

slide-142
SLIDE 142

Erlang Riak Core Bitcask eLevelDB Memory Multi Riak Pipe Riak API Riak PB Riak Clients Erlang Java Ruby C/C++ Python .NET PHP Go Nodejs More.. Yokozuna Webmachine HTTP Riak KV

Riak Architecture

Erlang on top C/C++ on the bottom

image courtesy of Eric Redmond, "A Little Riak Book" https://github.com/coderoshi/little_riak_book/

82 Wednesday, April 24, 13

slide-143
SLIDE 143

Linking With C/C++

  • Erlang provides the ability to dynamically link C/C++

libraries into the VM

  • One way is through the driver interface
  • for example the VM supplies network and file system

facilities via drivers

  • Another way is through Native Implemented Functions

(NIFs)

83 Wednesday, April 24, 13

slide-144
SLIDE 144

Native Implemented Functions (NIFs)

  • Lets C/C++ functions operate as Erlang functions
  • Erlang module serves as entry point
  • When module loads it dynamically loads its NIF shared

library, overlaying its Erlang functions with C/C++ replacements

84 Wednesday, April 24, 13

slide-145
SLIDE 145

Example: Eleveldb

  • NIF wrapper around Google's LevelDB C++ database
  • Erlang interface plugs in underneath Riak KV

85 Wednesday, April 24, 13

slide-146
SLIDE 146

Example: Eleveldb

86 Wednesday, April 24, 13

slide-147
SLIDE 147

Example: Eleveldb

87 Wednesday, April 24, 13

slide-148
SLIDE 148

Example: Eleveldb

88 Wednesday, April 24, 13

slide-149
SLIDE 149

NIF Features

  • Easy to convert arguments and return values between

C/C++ and Erlang

  • Ref count binaries to avoid data copying where needed
  • Portable interface to OS multithreading capabilities

(threads, mutexes, cond vars, etc.)

89 Wednesday, April 24, 13

slide-150
SLIDE 150

NIF Caveats

  • Crashes in your linked-in C/C++ kill the whole VM
  • Lesson: use NIFs and drivers only when needed, and

don't write crappy code

90 Wednesday, April 24, 13

slide-151
SLIDE 151

NIF Caveats

91 Wednesday, April 24, 13

slide-152
SLIDE 152

NIF Caveats

  • NIF calls execute within a VM scheduler thread

91 Wednesday, April 24, 13

slide-153
SLIDE 153

NIF Caveats

  • NIF calls execute within a VM scheduler thread
  • If the NIF blocks, the scheduler thread blocks

91 Wednesday, April 24, 13

slide-154
SLIDE 154

NIF Caveats

  • NIF calls execute within a VM scheduler thread
  • If the NIF blocks, the scheduler thread blocks
  • THIS IS VERY BAD

91 Wednesday, April 24, 13

slide-155
SLIDE 155

NIF Caveats

  • NIF calls execute within a VM scheduler thread
  • If the NIF blocks, the scheduler thread blocks
  • THIS IS VERY BAD
  • NIFs should block for no more than 1 millisecond

91 Wednesday, April 24, 13

slide-156
SLIDE 156

NIF Caveats

  • Last fall Basho found "scheduler anomalies" where
  • the VM would put most of its schedulers to sleep, by

design, under low load

  • but would fail to wake them up as load increased
  • Caused by NIF calls that were taking multiple seconds in

some cases

  • Lesson: put long-running activities in their own threads

92 Wednesday, April 24, 13

slide-157
SLIDE 157

TESTING

93 Wednesday, April 24, 13

slide-158
SLIDE 158

Eunit

  • Erlang's unit testing facility
  • Support for asserting test results, grouping tests, setup

and teardown, etc.

  • Used heavily in Riak

94 Wednesday, April 24, 13

slide-159
SLIDE 159

QuickCheck

  • Property-based testing product from Quviq, invented by

John Hughes (a co-inventor of Haskell)

  • Create a model of the software under test
  • QuickCheck runs randomly-generated tests against it
  • When it finds a failure, QuickCheck automatically

shrinks the testcase to a minimum for easier debugging

  • Used heavily in Riak, especially to test various protocols

and interactions

95 Wednesday, April 24, 13

slide-160
SLIDE 160

MISCELLANEOUS

96 Wednesday, April 24, 13

slide-161
SLIDE 161

Miscellaneous

  • Memory
  • Erlang shell
  • Hot code loading
  • VM knowledge
  • Finding Erlang developers

97 Wednesday, April 24, 13

slide-162
SLIDE 162

Memory

  • Process message queues have no limits, can cause out-
  • f-memory conditions if a process can't keep up
  • By design, VM dies if it runs out of memory
  • Apps like Riak run Erlang memory monitors that help

log and notify about looming out-of-memory conditions

98 Wednesday, April 24, 13

slide-163
SLIDE 163

Interactive Erlang Shell

  • Hard to imagine working without it
  • Huge help during development and debug

99 Wednesday, April 24, 13

slide-164
SLIDE 164

Hot Code Loading

  • It really works
  • Use it all the time during development
  • We've also used it to load repaired code into live

production systems for customers (with their permission

  • f course)

100 Wednesday, April 24, 13

slide-165
SLIDE 165

VM Knowledge

  • Running high-scale high-load systems like Riak requires

knowledge of Erlang VM internals

  • No difgerent than working with the JVM or other

language runtimes

101 Wednesday, April 24, 13

slide-166
SLIDE 166

Finding Erlang Devs

  • Erlang is easy to learn
  • Not really a problem to hire Erlang programmers
  • Basho hires great developers, those who need to learn

Erlang just do it

  • BTW we're hiring, see

http://bashojobs.theresumator.com

102 Wednesday, April 24, 13

slide-167
SLIDE 167

SUMMARY

103 Wednesday, April 24, 13

slide-168
SLIDE 168

Summary: Why Erlang For Riak?

  • Distributed systems features
  • sort of a "distributed systems DSL"
  • Concurrency features
  • Reliability features
  • Runtime introspection capabilities
  • Individual developer and team productivity

104 Wednesday, April 24, 13

slide-169
SLIDE 169

For More Erlang Info

105 Wednesday, April 24, 13

slide-170
SLIDE 170

For More Riak Info

  • "A Little Riak Book" by Basho's Eric Redmond

https://github.com/coderoshi/little_riak_book/

  • Mathias Meyer's "Riak Handbook"

http://riakhandbook.com

  • Eric Redmond's "Seven Databases in Seven Weeks"

http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks

106 Wednesday, April 24, 13

slide-171
SLIDE 171

For More Riak Info

  • Basho documentation

http://docs.basho.com

  • Basho blog

http://basho.com/blog/

  • Basho's github repositories

https://github.com/basho https://github.com/basho-labs

107 Wednesday, April 24, 13

slide-172
SLIDE 172

THANKS

http://basho.com @stevevinoski

108 Wednesday, April 24, 13