Beautiful Concurrency with Erlang Kevin Scaldeferri OSCON 23 July - - PDF document

beautiful concurrency with erlang
SMART_READER_LITE
LIVE PREVIEW

Beautiful Concurrency with Erlang Kevin Scaldeferri OSCON 23 July - - PDF document

Beautiful Concurrency with Erlang Kevin Scaldeferri OSCON 23 July 2008 6 years at Yahoo, building large high-concurrency distributed systems Not an expert, dont use it professionally Dabbled, liked it, want to share what I think is cool


slide-1
SLIDE 1

Beautiful Concurrency with Erlang

Kevin Scaldeferri OSCON 23 July 2008

6 years at Yahoo, building large high-concurrency distributed systems Not an expert, don’t use it professionally Dabbled, liked it, want to share what I think is cool

slide-2
SLIDE 2

What is Erlang?

  • Strict pure functional language
  • Strong dynamic typing

– weak structural user-defined types

  • Interpreted
  • Syntax similar to Prolog & ML
  • Concurrency primitives
  • Created at Ericsson for telecom

applications in 1987

Not going to talk about syntax, basic language features, etc Go to Francesco Cesarini’s talk yesterday.

slide-3
SLIDE 3

Erlang Concurrency Primitives

  • spawn - create a process
  • ! - send a message to a process
  • receive - listen for a message
slide-4
SLIDE 4

Parallelizing Algorithms

  • Quicksort
  • Shamelessly stolen from http://

21ccw.blogspot.com/2008/05/parallel- quicksort-in-erlang-part-ii.html

slide-5
SLIDE 5

qsort([]) -> []; qsort([Pivot|Rest]) -> qsort([ X || X <- Rest, X < Pivot]) ++ [Pivot] ++ qsort([ Y || Y <- Rest, Y >= Pivot]).

Erlang: one of those quicksort in 3 lines languages but... to small to read

slide-6
SLIDE 6

qsort([]) -> []; qsort([Pivot|Rest]) -> qsort([ X || X <- Rest, X < Pivot]) ++ [Pivot] ++ qsort([ Y || Y <- Rest, Y >= Pivot]).

slide-7
SLIDE 7

qsort([]) -> []; qsort([Pivot|Rest]) -> Left = [ X || X <- Rest, X < Pivot], Right = [ Y || Y <- Rest, Y >= Pivot], qsort(Left) ++ [Pivot] ++ qsort(Right).

Extract temp variables

slide-8
SLIDE 8

qsort([]) -> []; qsort([Pivot|Rest]) -> Left = [ X || X <- Rest, X < Pivot], Right = [ Y || Y <- Rest, Y >= Pivot], [SortedLeft, SortedRight] = map(fun qsort/1, [Left, Right]), SortedLeft ++ [Pivot] ++ SortedRight.

Add a map(), which looks odd but now we’re ready to do some magic

slide-9
SLIDE 9

qsort([]) -> []; qsort([Pivot|Rest]) -> Left = [ X || X <- Rest, X < Pivot], Right = [ Y || Y <- Rest, Y >= Pivot], [SortedLeft, SortedRight] = pmap(fun qsort/1, [Left, Right]), SortedLeft ++ [Pivot] ++ SortedRight.

Now we’re running on as many cores as you’ve got Who thinks this is a good idea?

slide-10
SLIDE 10

Don’t try this at home

actually 10x slower on my machine spawning a process is fast, but still much slower than a comparison / list cons a better example - web spidering

slide-11
SLIDE 11

spider(URIs) -> ... Links = pmap(fun get_links/1, URIs), ...

web spider needs to fetch content, parse XML/HTML, extract links Significant speedup here, both from parallelizing network requests and CPU

slide-12
SLIDE 12

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

slide-13
SLIDE 13

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

pmap uses map

slide-14
SLIDE 14

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

but instead of running the function directly, spawns a new process to run it

slide-15
SLIDE 15

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

slide-16
SLIDE 16

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

apply the function to the list item in the child process

slide-17
SLIDE 17

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

then send it back to the parent

slide-18
SLIDE 18

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

parent gathers results

slide-19
SLIDE 19

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

receive a message from each Pid we spawned

slide-20
SLIDE 20

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

cons up the return values

slide-21
SLIDE 21

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

slide-22
SLIDE 22

pmap(F, L) -> S = self(), Pids = map(fun(I) -> spawn(fun() -> pmap_f(S, F, I) end) end, L), pmap_gather(Pids). pmap_f(Parent, F, I) -> Parent ! {self(), (catch F(I))}. pmap_gather([H|T]) -> receive {H, Ret} -> [Ret|pmap_gather(T)] end; pmap_gather([]) -> [].

slide-23
SLIDE 23

Distributed Systems

Who uses Twitter? Who’s frustrated by twitter? Who’s written their own twitter clone?

slide-24
SLIDE 24

Twitter

“Twitter is, fundamentally, a messaging

  • system. Twitter was not architected as a

messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system.” -Alex Payne

Erlang approach: treat it as a messaging application. Model users by processes sending messages to each other.

slide-25
SLIDE 25

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try register(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

slide-26
SLIDE 26

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try register(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

create a user record

slide-27
SLIDE 27

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try register(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

spawn a new process to manage the user

slide-28
SLIDE 28

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try register(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

register a name for the process, so we can send using the username rather than pid

slide-29
SLIDE 29

follow(UserPid, OtherName) -> send(UserPid, {follow, OtherName}). ... send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

slide-30
SLIDE 30

follow(UserPid, OtherName) -> send(UserPid, {follow, OtherName}). ... send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

to add a follower

slide-31
SLIDE 31

follow(UserPid, OtherName) -> send(UserPid, {follow, OtherName}). ... send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

send a message to the user

slide-32
SLIDE 32

follow(UserPid, OtherName) -> send(UserPid, {follow, OtherName}). ... send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

saying “follow that guy”

slide-33
SLIDE 33

follow(UserPid, OtherName) -> send(UserPid, {follow, OtherName}). ... send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

slide-34
SLIDE 34

follow(UserPid, OtherName) -> send(UserPid, {follow, OtherName}). ... send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

send is just a thin wrapper around ! with error handling

slide-35
SLIDE 35

Going Global

  • Distribute across multiple machines?
  • Just use global names

so far, just running on one machine (can handle tens of thousands, maybe hundreds, of users) eventually need to grow past that to multiple machines. Fortunately this is easy

slide-36
SLIDE 36

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try register(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

just change register

slide-37
SLIDE 37

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try global:register_name(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

to global register

slide-38
SLIDE 38

create_user(Name) -> User = #user{name=Name}, Pid = spawn(fun() -> loop(User) end), try global:register_name(Name, Pid) of true -> {ok, Pid} catch error:badarg -> exit(Pid, in_use), {error, in_use} end.

slide-39
SLIDE 39

send(Name, Msg) -> try Name ! Msg catch error:badarg -> {error, no_such_user} end.

similarly, change !

slide-40
SLIDE 40

send(Name, Msg) -> try global:send(Name, Msg) catch error:badarg -> {error, no_such_user} end.

to global:send

slide-41
SLIDE 41

send(Name, Msg) -> try global:send(Name, Msg) catch error:badarg -> {error, no_such_user} end.

slide-42
SLIDE 42

Reliable Distributed Systems

What if a process crashes?

slide-43
SLIDE 43

Open Telecom Platform

OTP

OTP provides frameworks for common application patterns, and handles reliability by watching and restarting processes

slide-44
SLIDE 44
  • module(twitterl).
  • behaviour(gen_server).

We’ll use the gen_server behaviour (similar to a Java interface)

slide-45
SLIDE 45

create_user(Name) -> gen_server:start_link( {global, Name}, ?MODULE, [#user{name=Name}], [] ).

start_link handles registering names, spawning the process and running the main loop

slide-46
SLIDE 46

create_user(Name) -> gen_server:start_link( {global, Name}, ?MODULE, [#user{name=Name}], [] ).

slide-47
SLIDE 47

create_user(Name) -> gen_server:start_link( {global, Name}, ?MODULE, [#user{name=Name}], [] ).

using global names again

slide-48
SLIDE 48

create_user(Name) -> gen_server:start_link( {global, Name}, ?MODULE, [#user{name=Name}], [] ).

required callbacks provided by the current module

slide-49
SLIDE 49

create_user(Name) -> gen_server:start_link( {global, Name}, ?MODULE, [#user{name=Name}], [] ).

initial state

slide-50
SLIDE 50

follow(UserName, OtherName) -> gen_server:call( {global, UserName}, {follow, OtherName} ). post(UserName, Msg) -> gen_server:call( {global, UserName}, {post, Msg} ).

  • ther API function follow a common pattern
slide-51
SLIDE 51

follow(UserName, OtherName) -> gen_server:call( {global, UserName}, {follow, OtherName} ). post(UserName, Msg) -> gen_server:call( {global, UserName}, {post, Msg} ).

slide-52
SLIDE 52

follow(UserName, OtherName) -> gen_server:call( {global, UserName}, {follow, OtherName} ). post(UserName, Msg) -> gen_server:call( {global, UserName}, {post, Msg} ).

make a call to the server

slide-53
SLIDE 53

follow(UserName, OtherName) -> gen_server:call( {global, UserName}, {follow, OtherName} ). post(UserName, Msg) -> gen_server:call( {global, UserName}, {post, Msg} ).

using the global name

slide-54
SLIDE 54

follow(UserName, OtherName) -> gen_server:call( {global, UserName}, {follow, OtherName} ). post(UserName, Msg) -> gen_server:call( {global, UserName}, {post, Msg} ).

“follow” message

slide-55
SLIDE 55

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

set up callbacks for expected messages

slide-56
SLIDE 56

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

to follow another user

slide-57
SLIDE 57

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

add them to the list of people we’re following

slide-58
SLIDE 58

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

call the other process

slide-59
SLIDE 59

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

slide-60
SLIDE 60

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

and tell them to add you as a follower

slide-61
SLIDE 61

handle_call({follow, Other}, _From, State) -> NewF = [Other|State#user.following], gen_server:call( {global, Other}, {add_follower, State#user.name}), {reply, ok, State#user{following=NewF}};

tell gen_server all is good, and the new state

slide-62
SLIDE 62

handle_call({add_follower, F}, _From, State) -> NewF = [F | State#user.followers], {reply, ok, State#user{followers=NewF}};

the other process adds you to their follower list

slide-63
SLIDE 63

handle_call({post, Msg}, _From, State) -> map(fun(Name) -> gen_server:cast(Name, {posted, State#user.name, Msg}) end, State#user.followers), {reply, ok, State};

to post a message

slide-64
SLIDE 64

handle_call({post, Msg}, _From, State) -> map(fun(Name) -> gen_server:cast(Name, {posted, State#user.name, Msg}) end, State#user.followers), {reply, ok, State};

for each follower

slide-65
SLIDE 65

handle_call({post, Msg}, _From, State) -> map(fun(Name) -> gen_server:cast(Name, {posted, State#user.name, Msg}) end, State#user.followers), {reply, ok, State};

send the message we posted

slide-66
SLIDE 66

handle_call({post, Msg}, _From, State) -> map(fun(Name) -> gen_server:cast(Name, {posted, State#user.name, Msg}) end, State#user.followers), {reply, ok, State};

use cast() because we don’t care about any reply

slide-67
SLIDE 67

handle_cast({posted, Other, Msg}, State) -> %% really store in DB, SMS, etc io:fwrite("~s ~s~n", [Other, Msg]), {noreply, State};

just for illustrative purposes, print messages we receive. Really stick it in DB, send to SMS, etc

slide-68
SLIDE 68

In Summary

Erlang makes adding concurrency to your programs nearly trivial

slide-69
SLIDE 69

Thank You

slide-70
SLIDE 70

Resources

  • http://erlang.org/
  • http://trapexit.org/
  • #erlang on freenode
  • Erlang-questions@erlang.org
  • CEAN