Let it Recover: Multiparty Protocol-Induced Recovery 1 Fail fast - - PowerPoint PPT Presentation

let it recover
SMART_READER_LITE
LIVE PREVIEW

Let it Recover: Multiparty Protocol-Induced Recovery 1 Fail fast - - PowerPoint PPT Presentation

Let it Recover: Multiparty Protocol-Induced Recovery 1 Fail fast and recover quickly Erlang proverb Fail fast and recover quickly and safely OPCT proverb (after this talk) 2 Part One Background 3 The Erlang programming


slide-1
SLIDE 1

Let it Recover:

Multiparty Protocol-Induced Recovery

1

slide-2
SLIDE 2

“Fail fast and recover quickly”

Erlang proverb

“Fail fast and recover quickly and safely ”

OPCT proverb (after this talk)

2

slide-3
SLIDE 3

3

Part One Background

slide-4
SLIDE 4

The Erlang programming language

  • > 1;

factorial(0) factorial(X) when X > 0 -> X * factorial(X-1).

4

slide-5
SLIDE 5

Erlang’s coding philosophy

5

slide-6
SLIDE 6

Organise your processes in supervision trees

Let it crash: Erlang’s fault tolerance model

Do not program defensively, let the process crash In case of error, the process is automatically terminated Processes are linked. When a process crashes linked process are notified and (can be) restarted. Recently adopted by

  • ne-for-one

all-for-one rest-for-one

Supervision Strategies

6

slide-7
SLIDE 7

unsound

A recovery may cause deadlocks, orphan messages, reception errors

Supervision strategies: Drawbacks

Supervision strategies are: statically defined, error-prone

inefficient

7

slide-8
SLIDE 8

unsound inefficient

How to generate sound and efficient supervision strategies? By using Session Types!

8

slide-9
SLIDE 9

Session Types Overview

Global protocol (session type) Local protocol (session type)

Slice of global protocol relevant to one role Mechanically derived from a global protocol

A system of well-behaved processes is free from deadlocks,

  • rphan messages and reception errors

The framework has been applied to Java, Python, MPI/C, Go… Process language

Execution model of I/O actions by roles

9

slide-10
SLIDE 10

10

Part Two Let It Recover

slide-11
SLIDE 11

Protocol Dependency Graph Recovery Table

Recovery workflow

recovery algorithm implementation

A recovered system is free from deadlocks, orphan messages and reception error. Outperforms one of the built-in recovery strategies in Erlang

(A:3)

Erlang Runtim

11 †

(B:1) (C:2)

slide-12
SLIDE 12

This talk: Safe Recovery for Session Protocols

Approach

Recovery algorithm to analyse a global protocol as to calculate the dependencies of a failed process. Local supervisors monitor the state of the process in the protocol Protocol supervisors use the algorithms at runtime to decide which process to recover

12

slide-13
SLIDE 13

Causalities

slide-14
SLIDE 14

Causalities

slide-15
SLIDE 15

15

Part Three Recovery Algorithm

slide-16
SLIDE 16

Recovery Algorithm

16

slide-17
SLIDE 17

Recovery Algorithm

17

slide-18
SLIDE 18

5

Initialise Final condition

3 3, 4 3, 4

1:B

E; 2:C

E; 3:B 6:D

A; 4:C E; 7:E

A; 5:A B; D;

Initialise Final Condition

2 1 7 6

:5, 6, 7

3 3 4 4

3 4 3, 4 done not done

18

4 3, 4 3, 4

slide-19
SLIDE 19

Recovery points

recovery point: take the top node from the set of recovery nodes

Failure Recovery points … 3, A 3, B 4, C 4, A … … A:3, B:3, C:4 A:3, B:3, C:5 C:2, E:2 C:1, B:1, … …

Global Recovery Table

1:B 3:B

C; 2:C

E;

A; 4:C

A;

19

slide-20
SLIDE 20

Main Results: Transparency and Safety (informally)

Theorem: Transparency

The recovered protocol is a reduction of the initial protocol. The configuration of the system after a failure is reachable from the initial configuration.

Theorem:Safety

Any reachable configuration which is an initial configuration of well- formed global protocol is free from deadlock, an orphan massage and a reception error.

slide-21
SLIDE 21

21

Part Four Recovery Implementation

slide-22
SLIDE 22

Enabling Protocol Recovery in

gen_server stores recovery tables protocol specification protocol supervisor

(recover processes)

local supervisors

(monitor the process behaviour)

gen_server

(used to implement processes)

22

slide-23
SLIDE 23

Enabling Protocol Recovery in Erlang: Example

23

slide-24
SLIDE 24

Evaluation: Web Crawler Example

seconds

number of crashes

A process is chosen at random at the start Improvement when several failures occur By mistake initially we implemented all-for-one that introduced a deadlock

source: http://foat.me/articles/crawling-with-akka/

slide-25
SLIDE 25

Evaluation: Concurrency Patterns

seconds

Map Reduce Ring Calculator

52% improvement when

intense local computation disconnected interactions

Up to 7% overhead when all roles are restarted

slide-26
SLIDE 26

Future work & Resources

Framework summary

Ensure processes are safe and conform to a protocol (even in cases of failures) Create supervision trees and link processes dynamically based on a protocol structure

Future work

Support for stateful processes Integration with checkpoints Replications and recovery actions

Additional Resources

Scribble webpage: scribble.doc.ic.ac.uk Project source: https://gitlab.doc.ic.ac.uk/rn710/codeINspire

MRG webpage: http://mrg.doc.ic.ac.uk/

slide-27
SLIDE 27

Q & A

27

slide-28
SLIDE 28

Future work & Resources

Framework summary

Ensure processes are safe and conform to a protocol (even in cases of failures) Create supervision trees and link processes dynamically based on a protocol structure

Future work

Support for stateful processes Integration with checkpoints Replications and recovery actions

Additional Resources

Scribble webpage: scribble.doc.ic.ac.uk Project source: https://gitlab.doc.ic.ac.uk/rn710/codeINspire

MRG webpage: http://mrg.doc.ic.ac.uk/

28