Scaling Distributes Systems Natalia Chechina and RELEASE Team June - - PowerPoint PPT Presentation

scaling distributes systems
SMART_READER_LITE
LIVE PREVIEW

Scaling Distributes Systems Natalia Chechina and RELEASE Team June - - PowerPoint PPT Presentation

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources Scaling Distributes Systems Natalia Chechina and RELEASE Team June 11, 2015 N. Chechina, RELEASE team Scaling Distributes Systems Distributed Erlang


slide-1
SLIDE 1

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Scaling Distributes Systems

Natalia Chechina and RELEASE Team June 11, 2015

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-2
SLIDE 2

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Who am I?

2011: Received PhD degree in Computer Science from Heriot-Watt University, UK 2011-2015: WP3 lead in the EU RELEASE Project at Glasgow University, UK March 2015: Research Fellow at Glasgow University, UK Main research interest: Scaling distributed computations on commodity hardware

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-3
SLIDE 3

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Sources

Research findings Experience from the RELEASE project

Funded by EU FP7 Framework 5 academic & 3 industrial partners Aim: To scale the radical actor (concurrency-oriented) paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (105 cores) Erlang programming language

Experience of other researches and developers

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-4
SLIDE 4

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Scaling a Sysem

Scaling ALL aspects of computation Application Language Virtual Machine In-memory data structures Persistent data structures Tools (debugging, monitoring, etc)

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-5
SLIDE 5

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Scaling a Sysem

Scaling ALL aspects of computation Application Language Virtual Machine In-memory data structures Persistent data structures Tools (debugging, monitoring, etc)

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-6
SLIDE 6

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Scaling on language level Actor model Functional programming

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-7
SLIDE 7

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Language – Actor Model

Built-in concurrency Actors have own states and don’t share them Communication between actor happens only via message passing Actors can spawn new actors

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-8
SLIDE 8

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Language – Functional programming

Fundamental operation – application of functions to arguments Higher-order functions – well-structured software Modules – independent, reusable Lazy evaluations Variables given values only once

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-9
SLIDE 9

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Fault Tolerance 105 cores – approx. failure of 1 core per hour Non-defensive approach – Supervision & ”Let it crash”

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-10
SLIDE 10

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Philosophy Principles Ideas Core values

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-11
SLIDE 11

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

RELEASE Aim

To scale the radical actor (concurrency-oriented) paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (105 cores).

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-12
SLIDE 12

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

RELEASE Aim

To scale the radical actor (concurrency-oriented) paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (105 cores).

Erlang

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-13
SLIDE 13

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

RELEASE Aim

To scale the radical actor (concurrency-oriented) paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (105 cores).

Erlang

VM aspects, e.g. synchronisation on internal data structures Language aspects, e.g. maintaining a fully connected network of nodes, explicit process placement Tool support

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-14
SLIDE 14

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

RELEASE Aim

To scale the radical actor (concurrency-oriented) paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines (105 cores).

Erlang

VM aspects, e.g. synchronisation on internal data structures Language aspects, e.g. maintaining a fully connected network of nodes, explicit process placement Tool support

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-15
SLIDE 15

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Typical Target Architecture - 105 cores

Commodity hardware Non-uniform communication (Level0 – same host, Level1 – same cluster, etc)

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-16
SLIDE 16

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Erlang Overview

Erlang is a functional general purpose concurrent programming language developed in 1986 at Ericsson is dynamically typed was designed for distributed, fault-tolerant, massively concurrent, and soft-real time systems follows let it crash and share nothing philosophy The language primitives are processes. Erlang concurrency is handled by the language and not by the

  • perating system [Arm10].
  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-17
SLIDE 17

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang

Distributed Erlang

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-18
SLIDE 18

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang

Transitive connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-19
SLIDE 19

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang

Transitive connections Explicit Placement, i.e.

spawn(Node, Module, Function, Args) → pid()

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-20
SLIDE 20

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang Scalability Limitations

Global operations Global operations, i.e. registering names using global module

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-21
SLIDE 21

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang Scalability Limitations

Global operations Global operations, i.e. registering names using global module Other global operations, e.g. using rpc:call to call multiple nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-22
SLIDE 22

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang Scalability Limitations

Global operations Global operations, i.e. registering names using global module Other global operations, e.g. using rpc:call to call multiple nodes All-to-all transitive connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-23
SLIDE 23

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang Scalability Limitations

Global operations Global operations, i.e. registering names using global module Other global operations, e.g. using rpc:call to call multiple nodes All-to-all transitive connections

But... aren’t global operations and transitivity are optional in distributed Erlang? Why use them if they are a bottleneck?

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-24
SLIDE 24

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Distributed Erlang Scalability Limitations

Global operations Global operations, i.e. registering names using global module Other global operations, e.g. using rpc:call to call multiple nodes All-to-all transitive connections

But... aren’t global operations and transitivity are optional in distributed Erlang? Why use them if they are a bottleneck?

Reliability and fault tolerance – when a process or a node fail, the remaining nodes know about that. The same holds for the recovery It’s already there – no extra effort to connect nodes and distribute information Easy to scale – a new node knows about running nodes, and vice versa

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-25
SLIDE 25

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Scalable Distributed (SD) Erlang

SD Erlang is a small conservative extension of Distributed Erlang Network Scalability

All-to-all connections are not scalable onto 1000s of nodes Aim: Reduce connectivity

Semi-explicit Placement

Becomes not feasible for a programmer to be aware of all nodes Aim: Automatic process placement in groups of nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-26
SLIDE 26

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Free Node Connections vs. S group Node Connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-27
SLIDE 27

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Free Node Connections vs. S group Node Connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-28
SLIDE 28

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Free Node Connections vs. S group Node Connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-29
SLIDE 29

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Free Node Connections vs. S group Node Connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-30
SLIDE 30

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Free Node Connections vs. S group Node Connections

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-31
SLIDE 31

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Connections between Different Types of Nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-32
SLIDE 32

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Connections between Different Types of Nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-33
SLIDE 33

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Connections between Different Types of Nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-34
SLIDE 34

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Connections between Different Types of Nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-35
SLIDE 35

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Connections between Different Types of Nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-36
SLIDE 36

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Connections between Different Types of Nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-37
SLIDE 37

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Why S groups?

Preserve Erlang phylosophy & transitivity and scale Considered approaches Grouping nodes according to their hash values A hierarchical approach Overlapping s groups Other approaches Distributed Erlang global groups Spapi Router (SpilGames) Custom routing on non-transtively connected normal or hidden nodes

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-38
SLIDE 38

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Hierarchical Grouping

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-39
SLIDE 39

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Free Nodes and S groups

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-40
SLIDE 40

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Embedded Grouping

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-41
SLIDE 41

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

SD Erlang Improves Scalability

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-42
SLIDE 42

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Speed Up of Distributed Erlang Orbit & SD Erlang Orbit

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-43
SLIDE 43

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Speed Up of Distributed Erlang ACO & SD Erlang ACO

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-44
SLIDE 44

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Semi-Explicit Placement

Communication latencies between nodes may vary according to their relative positions In terms of communication time nodes may be “nearby” or “far away” We may wish some tasks to be close together because they’re communicating with each other a lot

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-45
SLIDE 45

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Example

System structure

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-46
SLIDE 46

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Example: system structure

Racks

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-47
SLIDE 47

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Example: system structure

Clusters

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-48
SLIDE 48

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Example: system structure

Cloud

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-49
SLIDE 49

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Dendrogram

bwlf16 bwlf13 bwlf15 bwlf03 bwlf05 bwlf34 bwlf06 bwlf12 bwlf08 bwlf27 bwlf33 bwlf18 bwlf28 bwlf23 bwlf04 bwlf07 bwlf10 bwlf17 bwlf26 bwlf01 bwlf20 bwlf21 bwlf31 bwlf14 bwlf19 bwlf24 bwlf25 bwlf29 bwlf02 bwlf09 bwlf22 bwlf30 bwlf11 bwlf32 amaterasu persephone

  • beron

cantor

  • siris

10000 15000 20000 25000 30000 Height

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-50
SLIDE 50

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Measuring communication distance

We can define a distance function d on the set V of Erlang VMs in a distributed system by d(x, y) =

  • if x = y

2−ℓ(x,y) if x = y. where ℓ(x, y) is the length of the longest path which is shared by the paths from the root to x and y.

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-51
SLIDE 51

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Distances

ℓ(b, c) = 2 d(b, c) = 2−2 = 1/4

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-52
SLIDE 52

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Distances

ℓ(b, g) = 1 d(b, g) = 2−1 = 1/2

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-53
SLIDE 53

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

Distances

ℓ(b, k) = 0 d(b, k) = 2−0 = 1

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-54
SLIDE 54

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources SD Erlang Network Scalability Validation Semi-Explicit Placement

choose nodes/1

Every node may have a list of attributes choose nodes/1 function returns a list of nodes that satisfy given restrictions

s_group: choose_nodes ([Parameter]) -> [Node] where Parameter = {s_group , SGroupName} | {attribute , AttributeName } | {nearer , 0.4} | {between , 0.5, 0.7} SGroupName = group_name () AttributeName = term ()

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-55
SLIDE 55

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources S group Operational Semantics Validation of SD Erlang Semantics and Implementation

Operational Semantics

(state, command, ni) − → (state′, value) Executing command on node ni in state returns value and transitions to state′.

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-56
SLIDE 56

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources S group Operational Semantics Validation of SD Erlang Semantics and Implementation

Validation of Semantics and Implementation

Validate the consistency between the formal semantics and the SD Erlang implementation Use Erlang QuickCheck tool developed by QuviQ Behaviour is specified by properties expressed in a logical form eqc statem is a finite state machine in QuickCheck

Figure: Testing SD Erlang Using QuickCheck eqc statem

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-57
SLIDE 57

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Ongoing and Future Work

S groups Introduce more patterns, for example, routing for a tree structure Analysis of fault tolerance strategies and features in SD Erlang applications Semi-explicit Placement Discovering system structure at runtime Robustness – dynamically adjusting a view of the system if new nodes join it, or if existing ones fail

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-58
SLIDE 58

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Sources

SD Erlang http://www.dcs.gla.ac.uk/research/sd-erlang/ RELEASE Project http://www.release-project.eu/ Deployment tool Wombat https://www.erlang-solutions.com/products/wombat Profiling tools Percept2 https://github.com/release-project/percept2 devo https://www.youtube.com/watch?v=Ox30TBDcFPw Benchmarking BenchErl http://release.softlab.ntua.gr/bencherl/index.html DEbench, Orbit, ACO https://github.com/release-project/benchmarks

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-59
SLIDE 59

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

Thank you!

  • N. Chechina, RELEASE team

Scaling Distributes Systems

slide-60
SLIDE 60

Distributed Erlang Scalable Distributed (SD) Erlang Operational Semantics Plans Sources

  • J. Armstrong.

Erlang.

  • Commun. ACM, 53:68–75, 2010.
  • N. Chechina, RELEASE team

Scaling Distributes Systems