Functional Distributed Programming with Irmin QCon NYC 2015, New - - PowerPoint PPT Presentation

functional distributed programming with irmin
SMART_READER_LITE
LIVE PREVIEW

Functional Distributed Programming with Irmin QCon NYC 2015, New - - PowerPoint PPT Presentation

Functional Distributed Programming with Irmin QCon NYC 2015, New York Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas Leonard University of Cambridge Computer Laboratory June 12, 2015 Anil Madhavapeddy (speaker)


slide-1
SLIDE 1

Functional Distributed Programming with Irmin

QCon NYC 2015, New York Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas Leonard University of Cambridge Computer Laboratory June 12, 2015

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 1 / 29

slide-2
SLIDE 2

Background ◮ Git in the datacenter ◮ Irmin, a large-scale, immutable, branch-consistent storage Weakly consistent data structures ◮ Mergeable queues ◮ Mergeable ropes Benchmarking Irmin Use Cases

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 2 / 29

slide-3
SLIDE 3

Background Git in the datacenter

Common features every distributed system needs

  • Persistence for fault tolerance and scaling
  • Scheduling of communication between nodes
  • Tracing across nodes for debugging and profiling

Most distributed systems run over an operating system, and so are stuck with the OS kernel exerting control. We use unikernels, which are application VMs that have complete control over their resources.

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 3 / 29

slide-4
SLIDE 4

Background Git in the datacenter

What if we just used Git?

  • Persistence
  • git clone of a shared repository across nodes
  • git commit of local operations in the node
  • Scheduling
  • git pull to receive events from other nodes
  • git push to publish events to other nodes
  • Tracing and Debugging
  • git log to see global operations
  • git checkout to roll back time to a snapshot
  • git bisect to locate problem messages

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 4 / 29

slide-5
SLIDE 5

Background Git in the datacenter

Problems with using Git?

  • Garbage Collection
  • Git records all operations permanently, so our database will grow

permanently!

  • git rebase is needed to compact history.
  • Shell Control
  • Calling the git command-line is slow and lacks fine control.
  • Makes it hard to extend the Git protocol for additional features.
  • Programming Model
  • Git is designed for distributed source code manipulation.
  • Built-in merge functions designed around text files.
  • Let’s use it for distributed data structures instead!

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 5 / 29

slide-6
SLIDE 6

Background Irmin, a large-scale, immutable, branch-consistent storage

Irmin, large-scale, immutable, branch-consistent storage

  • Irmin is a library to persist and synchronize distributed data

structures both on-disk and in-memory

  • It enables a style of programming very similar to the Git workflow,

where distributed nodes fork, fetch, merge and push data between each other

  • The general idea is that you want every active node to get a local

(partial) copy of a global database and always be very explicit about how and when data is shared and migrated

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 6 / 29

slide-7
SLIDE 7

Background Irmin, a large-scale, immutable, branch-consistent storage Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 7 / 29

slide-8
SLIDE 8

Background Irmin, a large-scale, immutable, branch-consistent storage

type t = ... (** User -defined

  • contents. *)

type result = [ ‘Ok of t | ‘Conflict of string ] val merge:

  • ld:t

→ t → t → result (** 3-way merge

  • functions. *)
  • ld

x y ?

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 8 / 29

slide-9
SLIDE 9

Background Irmin, a large-scale, immutable, branch-consistent storage

Demo: Distributed Logging

Multiple nodes all logging to a central store:

1 Design the logging data structure.

  • A log is a list of (string + timestamp)
  • When merging, the timestamps must be in increasing order
  • Equal timestamps can be in any order
  • With this logic, merge conflicts are impossible

2 Every node clones the log repository 3 A log is recorded locally, then pushed centrally.

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 9 / 29

slide-10
SLIDE 10

Weakly consistent data structures

Weakly consistent data structures

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 10 / 29

slide-11
SLIDE 11

Weakly consistent data structures Mergeable queues

Mergeable queues

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 11 / 29

slide-12
SLIDE 12

Weakly consistent data structures Mergeable queues

module type IrminQueue.S = sig type t type elt val create : unit → t val length : t → int val is_empty : t → bool val push : t → elt → t val pop : t → (elt * t) val peek : t → (elt * t) val merge : IrminMerge.t end

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 12 / 29

slide-13
SLIDE 13

Weakly consistent data structures Mergeable queues I0 n01 n02 n03 n04 n05 n06 n07 I1 n11 n12 n13 n14 top bottom top b

  • t

t

  • m

Index Node Elt pop list push list Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 13 / 29

slide-14
SLIDE 14

Weakly consistent data structures Mergeable queues

I old

A B C D

I 1

A B C D E

I 2

B C D F G

I 1

B C D E

I 2

F G

I 1

B C D E

I 2

F G

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 14 / 29

slide-15
SLIDE 15

Weakly consistent data structures Mergeable queues

Current state Operation Read Write Push 2 O(1) Pop 2 on average 1 on average O(1) Merge n 1 O(n)

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 15 / 29

slide-16
SLIDE 16

Weakly consistent data structures Mergeable queues

Current state Operation Read Write Push 2 O(1) Pop 2 on average 1 on average O(1) Merge n 1 O(n) With a little more work Operation Read Write Push 2 O(1) Pop 2 on average 1 on average O(1) Merge log n 1 O(log n)

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 15 / 29

slide-17
SLIDE 17

Weakly consistent data structures Mergeable ropes

Mergeable ropes

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 16 / 29

slide-18
SLIDE 18

Weakly consistent data structures Mergeable ropes

module type IrminRope.S = sig type t type value (* e.g char *) type cont (* e.g string *) val create : unit → t val make : cont → t ... val set : t → int → value → t val get : t → int → value val insert : t → int → cont → t val delete : t → int → int → t val append : t → t → t val split : t → int → (t * t) val merge : IrminMerge.t end

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 17 / 29

slide-19
SLIDE 19

Weakly consistent data structures Mergeable ropes

Operation Rope String Set/Get O(log n) O(1) Split O(log n) O(1) Concatenate O(log n) O(n) Insert O(log n) O(n) Delete O(log n) O(n) Merge log (f (n)) f (n)

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 18 / 29

slide-20
SLIDE 20

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-21
SLIDE 21

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-22
SLIDE 22

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 5 2 2 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-23
SLIDE 23

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 5 2 2 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-24
SLIDE 24

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 5 2 2 2 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-25
SLIDE 25

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 5 2 2 2 4 3 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-26
SLIDE 26

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 5 5 2 2 2 4 3 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-27
SLIDE 27

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 10 5 5 2 2 2 4 3 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-28
SLIDE 28

Weakly consistent data structures Mergeable ropes 10 5 2 2 2 1 lo rem ip sum do a met 10 5 5 2 2 2 1 lo rem ip sum do lor a met 10 5 2 2 2 4 3 lo rem ip sum do sit a met 10 5 5 2 2 2 4 3 lo rem ip sum do lor sit a met Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 19 / 29

slide-29
SLIDE 29

Benchmarking Irmin

Benchmarking Irmin

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 20 / 29

slide-30
SLIDE 30

Benchmarking Irmin

100000 200000 300000 400000 500000 500 1000 1500 2000 2500 3000 3500 4000 4500 Time spent for whole operations (µs) Number of push/pop successively applied Core Obj Memory GitMem GitDsk

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 21 / 29

slide-31
SLIDE 31

Benchmarking Irmin

module ObjBackend ... = struct type t = unit type key = K.t type value = V.t let create () = return () let clear () = return () let add t value = return (Obj.magic (Obj.repr value)) let read t key = return (Obj.obj (Obj.magic key)) let mem t key = return true ... end

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 22 / 29

slide-32
SLIDE 32

Benchmarking Irmin

100 200 300 400 500 500 1000 1500 2000 2500 3000 3500 4000 4500 Time spent for whole operations (µs) Number of push/pop successively applied Core Obj Memory GitMem GitDsk

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 23 / 29

slide-33
SLIDE 33

Use Cases

Use Cases

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 24 / 29

slide-34
SLIDE 34

Use Cases

Demo: Dog, a loyal synchronization tool

Command line interface to logging clients at https://github.com/samoht/dog

1 dog listen to setup the server listener

  • Server maintains list of clients in a subtree
  • It regularly merges all clients in parallel to master branch

2 dog init starts up a client logger 3 dog push syncs the client with the server

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 25 / 29

slide-35
SLIDE 35

Use Cases

Demo: CueKeeper, an Irmin TODO manager

Do Git programming in the browser https://github.com/talex5/cuekeeper http://test.roscidus.com/CueKeeper/

1 Irmin is written in OCaml, and compiles to efficient JavaScript

  • Git objects are mapped into IndexedDB
  • Uses LocalStorage to sync between tabs

2 DOM elements are computed from the Git store (a React-like

model)

3 Client has full history, snapshotting and custom merge logic.

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 26 / 29

slide-36
SLIDE 36

Use Cases

Demo: XenStore TNG

The Xen hypervisor toolstack https://www.youtube.com/watch?v=DSzvFwIVm5s

1 Xen is a widely deployed hypervisor (Amazon EC2, Rackspace

Cloud, ...)

  • Every VM boot needs a lot of communication
  • Tracing when something goes wrong is hard
  • Programming model is quite reactive

2 Dave Scott from Citrix ported the core toolstack to use Irmin,

and made it faster!

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 27 / 29

slide-37
SLIDE 37

Use Cases

Why OCaml?

  • Let us prototype complex functional datastructures very quickly
  • Efficient compilation to native code (x86, ARM, PowerPC,

Sparc, ...), unikernels (MirageOS), JavaScript and Java

  • Execution model is strict and predictable, important for

systems programming

  • Native code compilation is statically linked, or can be used as a

normal shared library

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 28 / 29

slide-38
SLIDE 38

Use Cases

Irmin Status (“Not Entirely Insane”)

  • Still pre 1.0, but several useful datastructures such as

distributed queues and efficient ropes.

  • HTTP REST for remote clients, library via OCaml, or

command-line interface.

  • Bidirectional operation, so git commits map to Irmin commits

from any direction.

  • Open source at https://irmin.io, installable via the OPAM

package manager at https://opam.ocaml.org

  • Feedback welcome at

mirageos-devel@lists.xenproject.org or https://github.com/mirage/irmin/issues

Anil Madhavapeddy (speaker) with Benjamin Farinier, Thomas Gazagnaire, Thomas LeonardUniversity of Cambridge Computer Functional Distributed Programming with Irmin 29 / 29