Programming Reactive Systems in Scala: Principles and Abstractions - - PowerPoint PPT Presentation

programming reactive systems in scala principles and
SMART_READER_LITE
LIVE PREVIEW

Programming Reactive Systems in Scala: Principles and Abstractions - - PowerPoint PPT Presentation

Programming Reactive Systems in Scala: Principles and Abstractions Philipp Haller KTH Royal Institute of Technology Stockholm, Sweden Entwicklertag Frankfurt, Germany, 21 February, 2018 What are reactive systems? Multiple definitions proposed


slide-1
SLIDE 1

Philipp Haller

KTH Royal Institute of Technology Stockholm, Sweden

Entwicklertag Frankfurt, Germany, 21 February, 2018

Programming Reactive Systems in Scala: Principles and Abstractions

slide-2
SLIDE 2 Philipp Haller

What are reactive systems?

  • Multiple definitions proposed previously, e.g. by Gérard

Berry [1] and by the Reactive Manifesto [2]

  • Common among definitions: reactive systems
  • react to events or messages from their environment
  • react (typically) "at a speed which is determined by the

environment, not the program itself" [1]

  • Thus, reactive systems are:
  • responsive
  • scalable
2
slide-3
SLIDE 3 Philipp Haller

What makes it so difficult to build reactive systems?

3
  • 1. Workloads require massive scalability
  • Steam, a digital distribution service, delivers 16.9 PB

per week to users in Germany (USA: 46.9 PB) [3]

  • CERN amassed about 200 PB of data from over 800

trillion collisions looking for the Higgs boson. [4]

  • Twitter has about 330 million monthly active users [5]
  • 2. Reacting at the speed of the environment (guaranteed

timely responses)

slide-4
SLIDE 4 Philipp Haller 4

Steam delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [3]

slide-5
SLIDE 5 Philipp Haller

What makes it so difficult to build reactive systems?

  • 1. Workloads require massive scalability
  • Steam, a digital distribution service, delivers 16.9 PB

per week to users in Germany (USA: 46.9 PB) [3]

  • CERN amassed about 200 PB of data from over 800

trillion collisions looking for the Higgs boson. [4]

  • Twitter has about 330 million monthly active users [5]
  • 2. Reacting at the speed of the environment (guaranteed

timely responses)

5

February 2018 Q4, 2017

slide-6
SLIDE 6 Philipp Haller

Example: Twitter during Obama's inauguration

6

“ ”

“We saw 5x normal tweets-per-second and about 4x tweets-per-minute as this chart illustrates.” [6]
slide-7
SLIDE 7 Philipp Haller

Implications

  • Massive scalability ➟ large-scale distribution
  • Timely responses + distribution ➟ resiliency
7

"To make a fault-tolerant system you need at least two computers." - Joe Armstrong [7]

slide-8
SLIDE 8 Philipp Haller

How to program reactive systems?

Want to build systems responding to events emitted by their environment in a way that enables scalability, distribution, and resiliency

  • We're looking for programming abstractions!
  • How did we approach this in the Scala project?
8
slide-9
SLIDE 9 Philipp Haller

Example

  • Chat service
  • Many long-lived connections
  • Usually idle, with short bursts of traffic
9
slide-10
SLIDE 10 Philipp Haller

Chat service: first try

  • Thread per user session
  • Huge overheads stemming from heavyweight threads
  • Does not scale to large numbers of users
10
slide-11
SLIDE 11 Philipp Haller

Chat service: second try

  • Asynchronous I/O and thread pool
  • Session state maintained in

regular objects (e.g., POJOs)

  • Much more scalable
  • Problems:
  • Code difficult to maintain 


➟ "callback hell" [8]

  • Blocking calls fatal
11
slide-12
SLIDE 12 Philipp Haller

The trouble with blocking ops

12

def after[T](delay: Long, value: T): Future[T]

Example Function for creating a Future that is completed with value after delay milliseconds

slide-13
SLIDE 13 Philipp Haller

"after", version 1

13

def after1[T](delay: Long, value: T) = Future { Thread.sleep(delay) value }

slide-14
SLIDE 14 Philipp Haller

"after", version 1

14

assert(Runtime.getRuntime() .availableProcessors() == 8) for (_ <- 1 to 8) yield after1(1000, true) val later = after1(1000, true)

How does it behave? Quiz: when is “later” completed? Answer: after either ~1 s or ~2 s (most often)

slide-15
SLIDE 15 Philipp Haller

Promises

15
  • bject Promise {

def apply[T](): Promise[T] } trait Promise[T] { def success(value: T): Promise[T] def failure(cause: Throwable): Promise[T] def future: Future[T] }

slide-16
SLIDE 16 Philipp Haller

"after", version 2

16

def after2[T](delay: Long, value: T) = { val promise = Promise[T]() timer.schedule(new TimerTask { def run(): Unit = promise.success(value) }, delay) promise.future }

Much better behaved!

slide-17
SLIDE 17 Philipp Haller

Chat service example

  • Neither of the shown approaches is satisfactory
  • Thread-based approach induces huge overheads, does

not scale

  • Event-driven approach suffers from callback hell and

blocking operations are troublesome

17

We need better programming abstractions which reconcile scalability and productivity

slide-18
SLIDE 18 Philipp Haller

Better programming abstractions

  • At the end of 2005, our main influence was the Erlang

programming language

  • One of very few success stories in the area of

concurrent programming

  • Had been used successfully to build the influential

Ericsson AXD301 switch providing an availability of nine nines

  • … and there was a really great movie about Erlang [9] ;-)
  • Additional influences, including Argus [10], the join-

calculus [11], and other seminal languages and systems

18

Less than 32ms downtime per year

slide-19
SLIDE 19 Philipp Haller

Erlang and the actor model

  • Erlang: a dynamic, functional, distributed, concurrency-oriented
programming language
  • Provides an implementation of the actor model of concurrency [12]
  • Actors = concurrent "processes" communicating via message passing
  • No shared state
  • Senders decoupled from receivers ➟ asynchronous messaging
  • Upon receiving a message, an actor may
  • change its behavior/state
  • send messages to actors (including itself)
  • create new actors
19

Sender does not fail if receiver fails!

slide-20
SLIDE 20 Philipp Haller

Actors in Scala (using Akka)

20

class Counter extends Actor with ActorLogging { var sum = 0 def receive = { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } }

Definition of an actor class:

case class AddAll(values: Array[Int]) case class PrintSum()

slide-21
SLIDE 21 Philipp Haller

Client of an actor

21
  • bject Main {

def main(args: Array[String]): Unit = { val system = ActorSystem("system") val counter: ActorRef = system.actorOf(Counter.props, "counter") counter ! AddAll(Array(1, 2, 3)) counter ! AddAll(Array(4, 5)) counter ! PrintSum() } }

Creating and using an actor:

Asynchronous message sends

  • bject Counter {

def props: Props = Props(new Counter) } Actor creation properties

slide-22
SLIDE 22 Philipp Haller

Actors: important features

  • Actors are isolated
  • Field sum not accessible from outside
  • Ensured by exposing only an ActorRef to clients
  • ActorRef provides an extremely simple interface
  • Messages in actor's mailbox are processed sequentially
  • No concurrency control necessary within an actor
  • Messaging is location-transparent
  • ActorRefs may be remote; can be sent in messages
22
slide-23
SLIDE 23 Philipp Haller

Resiliency using actors

  • Erlang's approach to fault handling: "let it crash!"
  • Do not:
  • try to avoid failure
  • attempt to repair program state/data in case of failure
  • Do:
  • let faulty actors crash
  • manage crashed actors via supervision
23
slide-24
SLIDE 24 Philipp Haller

Actor supervision: strategy 1

24
slide-25
SLIDE 25 Philipp Haller

Actor supervision: strategy 2

25
slide-26
SLIDE 26 Philipp Haller

Actor supervision: strategy 3

26
slide-27
SLIDE 27 Philipp Haller

Resiliency (continued)

How to restart a fresh actor from some previous state?

  • Supervisor initializes its state, or
  • Fresh actor obtains initial state from elsewhere, or
  • Fresh actor replays received messages from persistent log


➟ event sourcing: Akka Persistence

27
slide-28
SLIDE 28 Philipp Haller

Actors in Scala

  • Q: Is all of this built into Scala?
  • A: Not quite.
28
slide-29
SLIDE 29 Philipp Haller

Deconstructing actors

29

def receive = { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") }

  • receive method returns a partial function defined by

the block of cases { … }

slide-30
SLIDE 30 Philipp Haller

Deconstructing actors

30
  • bject Actor {

// Type alias for receive blocks type Receive = PartialFunction[Any, Unit] // ... } trait Actor { def receive: Actor.Receive // ... }

slide-31
SLIDE 31 Philipp Haller

Partial functions

31
  • Partial functions have a type PartialFunction[A, B]
  • PartialFunction[A, B] is a subtype of Function1[A, B]
trait Function1[A, B] { def apply(x: A): B .. } trait PartialFunction[A, B] extends Function1[A, B] { def isDefinedAt(x: A): Boolean def orElse[A1 <: A, B1 >: B] (that: PartialFunction[A1, B1]): PartialFunction[A1, B1] .. }

Simplified!

slide-32
SLIDE 32 Philipp Haller

Pattern matching

The case clauses are just regular pattern matching in Scala:

32

{ case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") }

val opt: Option[Int] = this.getOption()
  • pt match {
case Some(x) => // full optional object // use `x` of type `Int` case None => // empty optional object // no value available }
slide-33
SLIDE 33 Philipp Haller

Deconstructing actors

33

counter ! AddAll(Array(1, 2, 3)) counter ! AddAll(Array(4, 5)) counter ! PrintSum() The ! operator is just a method written using infix syntax:

"Aha! Built-in support for messaging!!"

abstract class ActorRef extends .. { def !(message: Any): Unit // .. }

Simplified! Not actual implementation!

slide-34
SLIDE 34 Philipp Haller

Summary

  • Actors not built into Scala
  • Rely only on shared-memory threads of the JVM
  • Scala as a "growable" language [13]
  • Programming models as libraries
  • Akka actors = domain-specific language (DSL)

embedded in Scala

  • Many of the patterns and techniques first implemented

in Scala Actors [14]

34
slide-35
SLIDE 35 Philipp Haller 35 https://www.lightbend.com/akka-five-year-anniversary
slide-36
SLIDE 36 Philipp Haller

There is more

36
  • Q: Actors are clearly awesome! All problems solved?
  • A: Not quite.
slide-37
SLIDE 37 Philipp Haller

Example

37

Image data apply filter

Image processing pipeline:

filter 1 filter 2

Pipeline stages run concurrently
slide-38
SLIDE 38 Philipp Haller

Implementation

38
  • Assumptions:
  • Image data large
  • Main memory expensive
  • Approach for high performance:
  • In-place update of image buffers
  • Pass mutable buffers by-reference
slide-39
SLIDE 39 Philipp Haller

Problem

39

Easy to produce data races:

1. Stage 1 sends a reference to a buffer to stage 2

  • 2. Following the send, both stages have a reference

to the same buffer

  • 3. Stages can concurrently access the buffer
slide-40
SLIDE 40 Philipp Haller

Preventing data races

40
  • Approach: safe transfer of ownership
  • Sending stage loses ownership
  • Compiler prevents sender from accessing objects

that have been transferred

  • Advantages:
  • No run-time overhead
  • Safety does not compromise performance
  • Errors caught at compile time
slide-41
SLIDE 41 Philipp Haller

Ownership transfer in Scala

41
  • Active research project: LaCasa [15]
  • LaCasa: Scala extension for affine references
  • "Transferable" references
  • At most one owner per transferable reference
slide-42
SLIDE 42 Philipp Haller

Affine references in LaCasa

42
  • LaCasa provides affine references by

combining two concepts:

  • Access permissions
  • Encapsulated boxes
slide-43
SLIDE 43 Philipp Haller

Access permissions

43
  • Access to transferable objects controlled by

implicit permissions

  • Type member C uniquely identifies box

CanAccess { type C } Box[T] { type C }

slide-44
SLIDE 44 Philipp Haller

Creating boxes and permissions

44

mkBox[Message] { packed => } class Message { var arr: Array[Int] = _ }

sealed trait Packed[+T] { val box: Box[T] val access: CanAccess { type C = box.C } }

implicit val access = packed.access val box = packed.box …

LaCasa library
slide-45
SLIDE 45 Philipp Haller

Accessing boxes

45
  • Boxes are encapsulated
  • Boxes must be opened for access

mkBox[Message] { packed => implicit val access = packed.access val box = packed.box box open { msg => msg.arr = Array(1, 2, 3, 4) } }

Requires implicit access permission
slide-46
SLIDE 46 Philipp Haller

Consuming permissions

46

Example: transfering a box from one actor to another consumes its access permission

mkBox[Message] { packed => implicit val access = packed.access val box = packed.box … someActor.send(box) { // make `access` unavailable … } }

Leverage spores [1]

slide-47
SLIDE 47 Philipp Haller

Encapsulation

47

Problem: not all types safe to transfer!

class Message { var arr: Array[Int] = _ def leak(): Unit = { SomeObject.fld = arr } }
  • bject SomeObject {
var fld: Array[Int] = _ }
slide-48
SLIDE 48 Philipp Haller

Encapsulation

48
  • Ensuring absence of data races requires

restricting types put into boxes

  • Requirements for “safe” classes:*
  • Methods only access parameters and this
  • Method parameter types are “safe”
  • Methods only instantiate “safe” classes
  • Types of fields are “safe”

“Safe” = conforms to object capability model [17]

* simplified
slide-49
SLIDE 49 Philipp Haller

Object capabilities in Scala

49
  • How common is object-capability safe code in Scala?
  • Empirical study of over 75,000 SLOC of open-source

Scala code:

Project Version SLOC GitHub stats Scala stdlib 2.11.7 33,107 ✭5,795 👦257 Signal/Collect 8.0.6 10,159 ✭123 👦11 GeoTrellis 0.10.0-RC2 35,351 ✭400 👦38
  • engine
3,868
  • raster
22,291
  • spark
9,192
slide-50
SLIDE 50 Philipp Haller

Object capabilities in Scala

50

Results of empirical study:

Project #classes/traits #ocap (%) #dir. insec. (%) Scala stdlib 1,505 644 (43%) 212/861 (25%) Signal/Collect 236 159 (67%) 60/77 (78%) GeoTrellis
  • engine
190 40 (21%) 124/150 (83%)
  • raster
670 233 (35%) 325/437 (74%)
  • spark
326 101 (31%) 167/225 (74%) Total 2,927 1,177 (40%) 888/1,750 (51%)

Immutability inference increases these percentages!

slide-51
SLIDE 51 Philipp Haller

Ongoing work

51
  • Flow-sensitive type checking
  • "Don't indent when consuming permission"
  • Empirical studies
  • How much effort to change existing code?
  • Language support for immutable types [18]
  • Complete mechanization in Coq proof assistant
slide-52
SLIDE 52 Philipp Haller

Conclusion

  • Scala enables powerful libraries for reactive

programming

  • Akka actors representative example
  • There are many others: Akka Streams, Spark

Streaming, REScala [19] etc.

  • Not all concurrency hazards can be prevented by Scala's

current type system.

  • In ongoing research projects, such as LaCasa and

Reactive Async [20], we are exploring ways to rule out data races and non-determinism

52
slide-53
SLIDE 53 Philipp Haller

References (1)

  • [1]: Gérard Berry, 1989. http://www-sop.inria.fr/members/Gerard.Berry/Papers/Berry-
IFIP-89.pdf
  • [2]: https://www.reactivemanifesto.org/
  • [3]: http://store.steampowered.com/stats/content/
  • [4]: https://www.itbusinessedge.com/cm/blogs/lawson/the-big-data-software-problem-
behind-cerns-higgs-boson-hunt/?cs=50736
  • [5]: https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
  • [6]: https://blog.twitter.com/2009/inauguration-day-twitter
  • [7]: http://www.erlang-factory.com/upload/presentations/45/keynote_joearmstrong.pdf
  • [8]: http://static.usenix.org/publications/library/proceedings/usenix02/full_papers/
adyahowell/adyahowell_html/
  • [9]: https://www.youtube.com/watch?v=uKfKtXYLG78
  • [10]: Liskov, 1988. Distributed programming in Argus. Communications of the ACM, 31(3),
pp.300-312. https://dl.acm.org/citation.cfm?id=42399 53
slide-54
SLIDE 54 Philipp Haller

References (2)

  • [11]: Fournet and Gonthier, 1996. The reflexive CHAM and the join-calculus. Proceedings of the 23rd ACM
SIGPLAN-SIGACT symposium on Principles of programming languages (pp. 372-385).
 https://dl.acm.org/citation.cfm?id=237805
  • [12]: Hewitt, Bishop, and Steiger, 1973. A universal modular actor formalism for artificial intelligence. Proc.
  • IJCAI. See also https://eighty-twenty.org/2016/10/18/actors-hopl
  • [13]: Guy Steele, 1998. "Growing a Language". OOPSLA keynote.

https://www.youtube.com/watch?v=_ahvzDzKdB0
  • [14]: Haller and Odersky, 2007. Actors that unify threads and events. In International Conference on
Coordination Languages and Models (pp. 171-190). Springer, Berlin, Heidelberg.
 https://link.springer.com/chapter/10.1007/978-3-540-72794-1_10
  • [15]: https://github.com/phaller/lacasa
  • [16]: Miller, Haller, and Odersky, 2014. Spores: A type-based foundation for closures in the age of
concurrency and distribution. Proc. ECOOP
 https://github.com/scalacenter/spores
  • [17]: Mark S. Miller, 2006. Robust Composition: Towards a Unified Approach to Access Control and
Concurrency Control. PhD thesis
  • [18]: https://www.youtube.com/watch?v=IiCt4nZfQfg
  • [19]: http://guidosalva.github.io/REScala/
  • [20]: https://github.com/phaller/reactive-async
54