Reactive systems architecture IoTlike media processing Operates - - PowerPoint PPT Presentation

reactive systems architecture iot like media processing
SMART_READER_LITE
LIVE PREVIEW

Reactive systems architecture IoTlike media processing Operates - - PowerPoint PPT Presentation

Reactive systems architecture IoTlike media processing Operates stream processing devices Exposes healthchecks and pipeline topology Provides a global view of the pipeline A distributed system without durable messaging easily


slide-1
SLIDE 1

Reactive systems architecture

slide-2
SLIDE 2

IoT–like media processing

  • Operates stream processing devices
  • Exposes health–checks and pipeline topology
  • Provides a global view of the pipeline
slide-3
SLIDE 3

A distributed system without durable messaging easily grows into a monolith

slide-4
SLIDE 4

Device Device Shadow

slide-5
SLIDE 5

Device Device Shadow

Log file

slide-6
SLIDE 6

Device Device Shadow

DB

slide-7
SLIDE 7

Device Device Shadow

Message queue X

slide-8
SLIDE 8

A distributed system without supervision is binary: 
 working or failed

slide-9
SLIDE 9

def makeDeviceRequest(request: DeviceRequest): Future[DeviceResponse] = ??? makeDeviceRequest(request).onComplete { case Success(dr) => // working! case Failure(ex) => // failed! } log.error(ex) makeDeviceRequest(request) } log.error(ex) scheduleOnce(1000L, makeDeviceRequest(request)) }

👊

slide-10
SLIDE 10

Naïve timeouts cause

slide-11
SLIDE 11
slide-12
SLIDE 12

…other timeouts

slide-13
SLIDE 13

…excessive downstream load when combined with re–tries

slide-14
SLIDE 14

Device Device Shadow

👊

slide-15
SLIDE 15
  • Timeout = 2000 ms
  • Timeout = 1950 ms
  • Timeout = 1900 ms
slide-16
SLIDE 16
  • Timeout = 2000 ms
  • Timeout = 1950 ms
  • Timeout = 1900 ms
  • Retries = 3
  • Timeout = 2000 ms
  • Retries = 2
  • Timeout = 1950 ms
  • Retries = 3
  • Timeout = 1900 ms

666 975 633

👊

📼 ☎ 💿

slide-17
SLIDE 17
  • Deadline = 2000 ms
  • QoS = …
  • Deadline = 1950 ms
  • QoS = …
  • Deadline = 1900 ms
  • QoS = …
slide-18
SLIDE 18

def makeDeviceRequest(request: DeviceRequest): Future[DeviceResponse] = ??? makeDeviceRequest(request).onComplete { case Success(dr) => // working! case Failure(ex) => // failed! } log.error(ex) makeDeviceRequest(request) } log.error(ex) scheduleOnce(1000L, makeDeviceRequest(request)) }

👊

slide-19
SLIDE 19

implicit val sys: ActorSystem = … implicit val mat: ActorMaterializer = … val base = Uri(…).authority val pool = Http(sys).cachedHostConnectionPool[String](base.host.address(), base.port) val correlationId = UUID.randomUUID().toString val rq = HttpRequest(…) val _ = Source.single(rq correlationId) .via(pool) .runForeach(…) .recoverWithRetries(5, …) .runForeach(…)

👊

🎊

slide-20
SLIDE 20

makeDeviceRequest(request)

Failed Failed

👊

slide-21
SLIDE 21

A distributed system without back–pressure will fail or will make everything around it fail

slide-22
SLIDE 22

“Let’s just go with the defaults for the thermal exhaust ports.” —Galen Erso

slide-23
SLIDE 23

implicit val sys: ActorSystem = … implicit val mat: ActorMaterializer = … val base = Uri(…).authority val pool = Http(sys).cachedHostConnectionPool[String](base.host.address(), base.port) val correlationId = UUID.randomUUID().toString val rq = HttpRequest(…) val _ = Source.single(rq correlationId) .via(pool) .recoverWithRetries(5, …) .runForeach(…)

  • Pool size
  • TCP connect timeout
  • TCP receive timeout
  • Body receive timeout
  • Flow timeout
  • How many retries?
  • Within what time–frame?
  • Idempotent endpoints?
  • nonces, etc.
slide-24
SLIDE 24

A distributed system without

  • bservability and monitoring is

a stack of black boxes

slide-25
SLIDE 25

👊

slide-26
SLIDE 26

A distributed system without robust access control is a ticking time–bomb

slide-27
SLIDE 27

Service A Message

privateKey payload string correlationId = 1; string token = 2; bytes signature = 3; bytes payload = 4; token publicKeys

Service B

privateKey payload token publicKeys valid?

👊

slide-28
SLIDE 28

A distributed system without chaos testing is going to fail in the most creative ways

slide-29
SLIDE 29

val mb = Array[Byte](8, 1, 12, 3, 65, 66, 67, 99, ..., 99) X.parseFrom(mb)

Exception in thread "main" java.lang.StackOverflowError at … JsonStreamContext.<init>(…:43) at … JsonReadContext.<init>(…:58) at … JsonReadContext.createChildObjectContext(…:128) at … ReaderBasedJsonParser._nextAfterName(…:773) at … ReaderBasedJsonParser.nextToken(…:636) at … JValueDeserializer.deserialize(…:45)

X.validate(mb) val mj = """{"x":""" * 2000 JsonFormat.fromJsonString[X](mj)

Exception in thread "main" java.lang.StackOverflowError at … $StreamDecoder.readTag(…:2051) at … $StreamDecoder.skipMessage(…:2158) at … $StreamDecoder.skipField(…:2090) …

👊

slide-30
SLIDE 30

message DeviceRequest { string method = 1; string uri = 2; map<string, string> headers = 3; string entity_content_type = 4; bytes entity = 5; string schedule_time = 10; MisfireStrategy misfire_strategy = 11; enum MisfireStrategy { BEST_EFFORT = 0; FORGET = 1; } }

👊

class DeviceActor extends Actor { …

  • verride def receive: Receive = {

case TopicPartitionOffsetMessage(tpo, dr: DeviceRequest, _) => val d = Duration.between(ZonedDateTime.now, ZonedDateTime.parse(dr.scheduleTime)) context.system.scheduler.scheduleOnce(FiniteDuration(d.toMillis, TimeUnit.MS), self, dr) case d: DeviceRequest => Source.single(d -> …).via(…).run(…) } }

slide-31
SLIDE 31

class DeviceActor extends Actor { …

  • verride def receive: Receive = {

case TopicPartitionOffsetMessage(tpo, dr: DeviceRequest, _) => val d = Duration.between(ZonedDateTime.now, ZonedDateTime.parse(dr.scheduleTime)) context.system.scheduler.scheduleOnce(FiniteDuration(d.toMillis, TimeUnit.MS), self, dr) case d: DeviceRequest => Source.single(d -> …).via(…).run(…) } }

message DeviceRequest { string method = 1; string uri = 2; map<string, string> headers = 3; string entity_content_type = 4; bytes entity = 5; string schedule_time = 10; MisfireStrategy misfire_strategy = 11; enum MisfireStrategy { BEST_EFFORT = 0; FORGET = 1; } } "GET" "/" Map.empty "application/json" "e30=" "2018-11-14T11:42:06+00:00” "BEST_EFFORT" "GET" "/" Map.empty "application/json" "e30=" "2014-10-14T11:42:06+00:00" "BEST_EFFORT" "GET" "/" Map.empty "application/json" "e30=" "2018-13-14T11:42:06+00:00" "BEST_EFFORT"

"I̗̘̦ ͝ n͇͇͙v̮̫ok̲̫̙͈ i̖͙̭̹̠̞ n̡̻̮̣̺ g̲͈͙̭͙̬͎ ̰t͔̦h̞̲e̢̤ ͍̬̲͖ f̴̘͕̣è͖ẹ̥̩l͖͔͚ i͓͚̦ ͠n͖͍̗͓̳̮ g͍ ̨o͚̪ ͡ f̘̣̬ ̖̘͖̟͙̮ c ҉ ͔ ̫ ͖ ͓ ͇ ͖ ͅ h̵̤̣͚͔ á̗̼͕ͅ

  • ̼̣̥s̱͈̺̖̦̻

͢ "

"a/%%30%30" Map.empty "#cmds=({'/bin/echo', #eps})" "4oGmdGVzdOKBpw==" "2017-10-14T11:42:06+00:00" "BEST_EFFORT"

👊

slide-32
SLIDE 32

class DeviceActor extends Actor { …

  • verride def receive: Receive = {

case TopicPartitionOffsetMessage(tpo, dr: DeviceRequest, _) => val d = Duration.between(ZonedDateTime.now, ZonedDateTime.parse(dr.scheduleTime)) context.system.scheduler.scheduleOnce(FiniteDuration(d.toMillis, TimeUnit.MS), self, dr) case d: DeviceRequest => Source.single(d -> …).via(…).run(…) } }

💤

Exception in thread "…" java.time.format.DateTimeParseException: 
 Text '2014-12-10T05:44:06.635Z[😝] could not be parsed at index 21

💤

Exception in thread "…" java.time.format.DateTimeParseException: 
 Text '2014-12-10T05:44:06.635Z[GMT] could not be parsed at index 11

💤

GET http://host/foo.action Content-Type: #cmds=({'/bin/echo', #eps})

💤

SIGSEGV (0xb) at pc=0x000000010fda8262, pid=21419, tid=18435 # V [libjvm.dylib+0x3a8262] PhaseIdealLoop::idom_no_update(Node*) const+0x12

https://github.com/minimaxir/big-list-of-naughty-strings 👊

slide-33
SLIDE 33

Do tell another anecdote…

slide-34
SLIDE 34

We measured

  • For every file in every commit in every project…
  • Classification of the kind and quality of code
  • Matching production performance data from PagerDuty
slide-35
SLIDE 35

👊

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38

The biggest impact on production performance comes from…

slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

Four things successful projects do
 before breakfast

  • Structured and performance–tested logging
  • Monitoring & [distributed] tracing
  • Performance testing
  • Reactive architecture & code

throughout their commit history

slide-42
SLIDE 42
slide-43
SLIDE 43

Thank you

jan.machacek@disneystreaming.com matthew.squire@disneystreaming.com