Programming Distributed Systems 12 Programming Models for - - PowerPoint PPT Presentation

programming distributed systems
SMART_READER_LITE
LIVE PREVIEW

Programming Distributed Systems 12 Programming Models for - - PowerPoint PPT Presentation

Programming Distributed Systems 12 Programming Models for Distributed Systems Annette Bieniusa AG Softech FB Informatik TU Kaiserslautern Summer Term 2019 Annette Bieniusa Programming Distributed Systems Summer Term 2019 1/ 27 What is a


slide-1
SLIDE 1

Programming Distributed Systems

12 Programming Models for Distributed Systems Annette Bieniusa

AG Softech FB Informatik TU Kaiserslautern

Summer Term 2019

Annette Bieniusa Programming Distributed Systems Summer Term 2019 1/ 27

slide-2
SLIDE 2

What is a Programming Model? [4]

A programming model is some form of abstract machine

Provides operations to the level above Requires implementations for these operations on the level(s) below

Simplification through abstraction Standard interface that remains stable even if underlying architecture changes Provide different levels of abstraction Often starting point for language development ⇒ Separation of concern between software developers and framework implementors (runtime system, compiler, etc.)

Annette Bieniusa Programming Distributed Systems Summer Term 2019 2/ 27

slide-3
SLIDE 3

Properties of good programming models

Meaningful abstractions System-architecture independent Efficiently implementable Easy to understand

Annette Bieniusa Programming Distributed Systems Summer Term 2019 3/ 27

slide-4
SLIDE 4

What kind of abstractions should a programming model for distributed systems provide?

Annette Bieniusa Programming Distributed Systems Summer Term 2019 4/ 27

slide-5
SLIDE 5

Remote Procedure Call

Annette Bieniusa Programming Distributed Systems Summer Term 2019 5/ 27

slide-6
SLIDE 6

Remote Procedure Call (RPC) [2]

Rather broad classifying term with changing meaning over time

From client-server design to interconnected services

Two entities (caller/callee) with different address spaces communicate over some channel in a request-response mechanism Examples: CORBA (Common Object Request Broker Architecture), Java RMI (Remote Method Invocation), SOAP (Simple Object Access Protocol), gRPC (Protocol Buffers), Twitter Finagle . . .

Annette Bieniusa Programming Distributed Systems Summer Term 2019 6/ 27

slide-7
SLIDE 7

Annette Bieniusa Programming Distributed Systems Summer Term 2019 7/ 27

slide-8
SLIDE 8

Flaws of RPC

Location transparency (i.e. request to remote service looks like local function call) masks the potential of distribution-related failures RPCs might timeout, requires usually special handling such as retrying Local functions do not need to deal with the problem of idempotence Execution time is unpredictable Passing of objects is complex (e.g. might need to serialize referenced objects) Translating data types between languages might rely on semantical approximation

Annette Bieniusa Programming Distributed Systems Summer Term 2019 8/ 27

slide-9
SLIDE 9

Aspects of modern RPC

Language-agnostic Serialization (aka marshalling or pickling)

JSON, XML, Protocol Buffers, . . .

Load-balancing

SOA (Service-oriented architecture) ⇒ Microservice architectures!

Asynchronous ⇒ RPC as term gets more and more diffuse

Annette Bieniusa Programming Distributed Systems Summer Term 2019 9/ 27

slide-10
SLIDE 10

Futures and Promises

“Asynchronous RPC” A future is a value that will eventually become available Two states:

completed: value is available incomplete: computation for value is not yet complete

Strategies: Eager vs. lazy evaluation Typical application: Web development and user interfaces

Annette Bieniusa Programming Distributed Systems Summer Term 2019 10/ 27

slide-11
SLIDE 11

Example

interface ArchiveSearcher { String search(String target); } class App { ExecutorService executor = ... ArchiveSearcher searcher = ... void showSearch(final String target) throws InterruptedException { Future<String> future = executor.submit(new Callable<String>() { public String call() { return searcher.search(target); }}); displayOtherThings(); // do other things while searching try { displayText(future.get()); // use future } catch (ExecutionException ex) { cleanup(); return; } } }

From Oracle’s Java Documentation

Annette Bieniusa Programming Distributed Systems Summer Term 2019 11/ 27

slide-12
SLIDE 12

Actors and Message Passing

Annette Bieniusa Programming Distributed Systems Summer Term 2019 12/ 27

slide-13
SLIDE 13

Characteristics of Actor Model [3]

Actors are isolated units of computation + state that can send messages asynchronously to each other Messages are queued in mailbox and processed sequentially when they match against some pattern/rule No assumptions on message delivery guarantees (Potential) State + behavior changes upon message processing[1] Very close to Alan Kay’s definition of Object-Oriented Programming

Annette Bieniusa Programming Distributed Systems Summer Term 2019 13/ 27

slide-14
SLIDE 14

Actors in the Wild

Erlang

Process-based Pure message passing

monitor and link for notification of process failure/shutdown

OTP (Open Telecom Platform) for generic reusable patterns

Akka

Actor model for the JVM Purges non-matching messages Enforces parental supervision Included in Scala standard library

Orleans

Actors for Cloud computing Scalability by replication Fine-grain reconciliation of state with transactions

Annette Bieniusa Programming Distributed Systems Summer Term 2019 14/ 27

slide-15
SLIDE 15

Message brokers

Message-oriented middleware which stores messages temporarily and forwards them to registered recipients Patterns: Publish-subscribe, point-to-point Acts as buffer for unavailable and overloaded recipients Decoupling of sender and receiver(s) Efficient 1-to-n multicast Advanced Message Queuing Protocol (AMQP) standardizes queuing, routing, reliability and security Delivery guarantees (at-most-once, at-least-once, exactly-once)

Annette Bieniusa Programming Distributed Systems Summer Term 2019 15/ 27

slide-16
SLIDE 16

Example: RabbitMQ

Supports (amongst others) publish-subscribe pattern Typical usage: Topics as routing keys Q1 is interested in all the orange animals Q2 wants to hear everything about rabbits, and everything about lazy animals Messages that don’t map any binding get lost Messages are maintained in the queue in publication order

Annette Bieniusa Programming Distributed Systems Summer Term 2019 16/ 27

slide-17
SLIDE 17

Stream processing

(Infinite) Sequence of data that is incrementally made available Example: Sensor data, audio / video delivery, filesystem APIs, etc. Producers vs. Consumers Notions of window and time: Consumers will receive only messages after subscribing Here: Event stream where data item is atypically associated with timestamp

Annette Bieniusa Programming Distributed Systems Summer Term 2019 17/ 27

slide-18
SLIDE 18

Classification of stream processing systems

1 What happens if producer sends messages faster than the

consumer can handle?

Drop messages Buffer messages Apply backpressure (i.e. prevent producer from sending more)

2 What happens if nodes become unreachable?

Loose messages Use replication and persistence to preserve non-acknowledged messages

Annette Bieniusa Programming Distributed Systems Summer Term 2019 18/ 27

slide-19
SLIDE 19

Log-based message brokers

Example: Kafka [https://kafka.apache.org] Message buffers are typically transient: Once the message is delivered, the message is deleted Idea: Combine durable storage with low-latency notification!

Annette Bieniusa Programming Distributed Systems Summer Term 2019 19/ 27

slide-20
SLIDE 20

Scalability and fault-tolerance for replicated logs

For scalability, partitioning of log on different machines For fault-tolerance, replication on different machines Need to ensure same ordering on all replicas (⇒ Total-order broadcast) Can easily add consumers for debugging, testing, etc. Ideas: Event-sourcing, immutability and audits

Annette Bieniusa Programming Distributed Systems Summer Term 2019 20/ 27

slide-21
SLIDE 21

Batch-processing

Static data sets that has known/finite size Need to artificially batch data into by day, month, minute, . . . Typically large latencies

Annette Bieniusa Programming Distributed Systems Summer Term 2019 21/ 27

slide-22
SLIDE 22

The Future: Distributed Programming Languages

Annette Bieniusa Programming Distributed Systems Summer Term 2019 22/ 27

slide-23
SLIDE 23

From Model to Language

Challenges: Partial failure, concurrency and consistency, latency, . . .

1 Distributed Shared Memory

Runtime maps virtual addresses to physical ones “Single-system” illusion

2 Actors

Explicit communication Location of processes is transparent

3 Dataflow

Data transformations expressed as DAG Processes are transparent Example: MapReduce (Google), Dryad (Microsoft), Spark

Annette Bieniusa Programming Distributed Systems Summer Term 2019 23/ 27

slide-24
SLIDE 24

Example: WordCount in MapReduce

Annette Bieniusa Programming Distributed Systems Summer Term 2019 24/ 27

slide-25
SLIDE 25

Further reading

Material collection by Northeastern University, CS7680 Special Topics in Computing Systems: Programming Models for Distributed Computing

Annette Bieniusa Programming Distributed Systems Summer Term 2019 25/ 27

slide-26
SLIDE 26

Further reading I

[1] Gul Agha. “Concurrent Object-Oriented Programming”. In:

  • Commun. ACM 33.9 (1990), S. 125–141. doi:

10.1145/83880.84528. url: http://doi.acm.org/10.1145/83880.84528. [2] Andrew Birrell und Bruce Jay Nelson. “Implementing Remote Procedure Calls”. In: ACM Trans. Comput. Syst. 2.1 (1984),

  • S. 39–59. doi: 10.1145/2080.357392. url:

https://doi.org/10.1145/2080.357392. [3] Carl Hewitt, Peter Boehler Bishop und Richard Steiger. “A Universal Modular ACTOR Formalism for Artificial Intelligence”. In: Proceedings of the 3rd International Joint Conference on Artificial Intelligence. Standford, CA, USA, August 20-23, 1973.

  • Hrsg. von Nils J. Nilsson. William Kaufmann, 1973, S. 235–245.

url: http://ijcai.org/Proceedings/73/Papers/027B.pdf.

Annette Bieniusa Programming Distributed Systems Summer Term 2019 26/ 27

slide-27
SLIDE 27

Further reading II

[4] David B. Skillicorn und Domenico Talia. “Models and Languages for Parallel Computation”. In: ACM Comput. Surv. 30.2 (1998),

  • S. 123–169. doi: 10.1145/280277.280278. url:

http://doi.acm.org/10.1145/280277.280278.

Annette Bieniusa Programming Distributed Systems Summer Term 2019 27/ 27