Reactive design patterns for microservices on multicore Reactive - PowerPoint PPT Presentation

Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com

Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2

1 MICROSERVICES ON MULTICORE 3

Microservices on Multicore Microservice architecture with actor model µService µService Actor µService Message passing 4

Microservices on Multicore Fast data means more inter-communication Batch Stream Computations Real time event processing Fast Data Highly Interconnected Workflows Communications 5

Microservices on Multicore Microservice architecture µService µService Actor µService Message passing 6

Microservices on Multicore Microservice architecture + Fast Data µService µService Actor µService Message passing New interactions 7

Microservices on Multicore Microservice architecture + Fast Data µService µService Actor µService Message passing New interactions More interactions 8

Microservices on Multicore More microservices should run on the same muticore machine machine 9

Microservices on Multicore Microservice architecture + Fast Data + Multicore + machine µService µService Core 10

Microservices on Multicore Microservice architecture + Fast Data + Multicore + machine Universal Law of Scalability (Gunther law) Performance model of a system based on queueing theory Perfect scalability ( N) Contention impact ( σ ) Coherency impact ( κ ) σ >>0, κ >0 σ >>0, κ =0 σ =0, κ =0 11

Microservices on Multicore From inter-thread communications... Core 12

Microservices on Multicore From inter-thread communications... Core 13

Microservices on Multicore …to inter -core communications Core 14

Microservices on Multicore Inter-core communication => cache coherency machine core 1 core N 1 cycle (0.3 ns) Registers Registers 4 cycles (1.3 ns) L1 I$ L1D$ L1 I$ L1D$ MESI > 600 600 cycles 12 cycles L2$ L2$ (200 ns) (4 ns) > 30 cycles > Shared L3$ or LLC (10 ns) Assuming Freq = 3 GHz 15

Microservices on Multicore Exchange software are pushing performance to hardware limits machine 50%ile 99.99%ile Stability Velocity Volume k.msg/s M. msg/s msec µsec 16

Simplx: one thread per core No context switching 17

Simplx: actors multitasking per thread High core utilization 18

Simplx: one event loop per core for communications Lock free Event loop = ~30 300 ns Event loop = ~30 300 ns ns Si Simplx runs on all cores 19

Multicore WITHOUT multithreaded programming ? 20

Microservices on Muticore Very good resources, but no multicore-related patterns machine 21

2 REACTIVE MULTICORE PATTERNS 22

Reactive multicore Patterns 7 patterns to unleash multicore reactivity machine Core-to-core messaging (2 patterns) Core monitoring (2 patterns) Core-to-core flow control (1 pattern) Core-to-cache management (2 patterns) 24

Core-to to-core messaging patterns 25

Pattern #1: the core-aware messaging pattern Inter-core communication: push message ~1 µs – 10 µs Push message ~ 500 ns ~ 300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); sender destination core socket server 26

Pattern #1: the core-aware messaging pattern Intra-core communication: push message ~300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); Push a message asynchronous 27

Pattern #1: the core-aware messaging pattern Intra-core communication: x150 speedup with direct call over push ~300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); Push a message asynchronous ~2 ns ActorReference < GreenActor > target = getLocalReference ( greenActorId ); [...] target -> hello (); Direct call synchronous Optimize calls according to the deployment 28

Pattern #2: the message mutualization pattern Network optimizations, core optimizations. Same fight In this use case, the 3 red consumers process the same data core actor push data 29

Pattern #2: the message mutualization pattern Communication has a cost Many events means high cache coherence usage (L3) 3 events core actor push data 30

Pattern #2: the message mutualization pattern Let’s mutualize inter-core communications 3 events 1 event 3 direct calls Local router 31

Pattern #2: the message mutualization pattern WITH pattern vs WITHOUT pattern: Linear improvement 32

Core monitoring patterns @ real-time 33

Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput We want to know in real-time the number of messages received per second, globally, and per core. StartSequence startSequence ; startSequence . addActor < RedActor >(0); // core 0 startSequence . addActor < RedActor >(0); // core 0 startSequence . addActor < RedActor >(1); // core 1 startSequence . addActor < RedActor >(1); // core 1 Simplx simplx ( startSequence ); core actor core actor data 34

Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput struct LocalMonitorActor : Actor { […] void newMessage () { ++ count ; 1 } } struct RedActor : Actor { […] ReferenceActor monitor ; RedActor () { monitor = newSingletonActor < LocalMonitorActor >(); } void onEvent () { monitor -> newMessage (); } } 1 Local monitoring Increase 1 Singleton message counter 35

Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput struct LocalMonitorActor : Actor , TimerProxy { […] LocalMonitorActor : TimerProxy (*this) { setRepeat ( 1000 ); } virtual void onTimeout () { 1 serviceMonitoringPipe . push < StatsEvent >( count ); count = 0 ; } } 1 sec ec 1 Inform monitoring of Timer Service monitoring the last second statistics 36

Pattern #4: the core usage pattern Core utilization Detect overloading cores before it is too late Relying on the CPU usage provided by the OS is not enough 100% does not mean the runtime is overloaded 10% does not tell how much data you can really process 37

Pattern #4: the core usage pattern No push, no event, no work 1 sec ec 20 loops in a second 0% core usage Reality is more about 3 millions loops per second Idle loop 38

Pattern #4: the core usage pattern Efficient core usage 1 sec ec 20 loops in a second 0% core usage 11 loops = 3 working loops 60% core usage Idle loop Working loop 39

Pattern #4: the core usage pattern Runtime performance counters help measurement Duration(IdleLoop) = 0.05 s Reality is more about Duration Idle loop ~300 ns CoreUsage = 1 – ∑( idleLoop)*0.05 100 idleLoop= 0|1 0 1 0 0 0 1 0 0 1 0 0 11 loops 8 idle loops 3 working loops 1 sec ec 60% core usage Idle loop Working loop Core usage actor 40

Demo: Real-time core monitoring A typical trading workflow Data stream Data processing 41

Core-to to-core flow control patterns 42

Pattern #5: the queuing prevention pattern What if producers overflow a consumer ? Your software cannot be more optimized ? Still, the incoming throughput could be too high, implying strong queuing. Continue ? Stop the flow ? Merge data ? Throttling ? Whatever the decision, we need to detect the issue 43

Pattern #5: the queuing prevention pattern What’s happening behind a push ? 44

Pattern #5: the queuing prevention pattern Local Simplx loops handle the inter-core communication Batc atch ID = 145 145 45

Pattern #5: the queuing prevention pattern Once the destination reads the data, the BatchID is incremented Batc atch ID = 145 145 Batc atch ID = 146 146 46

Pattern #5: the queuing prevention pattern BatchID does not increment if destination core is busy Batc atch ID = 145 145 Batc atch ID = 145 145 47

Pattern #5: the queuing prevention pattern Core to core communication at max pace Batc atch ID = 145 145 Batc atch ID = 145 145 BatchID batchID ( pipe ); pipe . push < Event >(); ( … ) if(batchID.hasChanged()) { // push again } else { //destination is busy //merging data, start throttling, reject orders … } 48

Pattern #5: the queuing prevention pattern Demo: code java Same id Last id => queuing => no queuing 49

Core-to to-cache management patterns 50

Pattern #6: the cache-aware split pattern FIX + execution engine new w or orde der 51

Pattern #6: the cache-aware split pattern FIX + execution engine A FIX order can easily size ~ 200 Bytes new w or orde der ack cknowledgment Almost all tags sent in the new order request need to be sent back in the acknowledgment 52

Pattern #6: the cache-aware split pattern Stability depends on the ability to be cache friendly 20 200 Bytes To stay « in-cache » and get stable per per order: performance, one core can store ~1300 open orders. 1  10000 open orders per book Local storage order book Local storage 53

Reactive design patterns for microservices on multicore Reactive - PowerPoint PPT Presentation

Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2 1

Microservices Security Fundamentals MICROSERVICES SECURITY CHALLENGES Wojciech Lesniak PRINCIPAL

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

Reactive Microsystems The Evolution of Microservices at Scale Jonas Bonr @jboner

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Design Patterns Applications Programming What is design patterns? The design patterns are

Design Patterns in Eiffel Dr. Till Bay design patterns? [Design Patterns] are

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Design Patterns 1 What are Design Patterns? Design patterns describe common (and successful)

More Design Patterns Horstmann ch.10.1,10.4 Design patterns Structural design patterns

Reactive elements Building Web Applications in R with Shiny Reactive objects Building Web

WHAT COMES AFTER MICROSERVICES? MATT RANNEY WHAT COMES AFTER MICROSERVICES? MATT RANNEY We

FROM HTTP TO KAFKA-BASED FROM HTTP TO KAFKA-BASED MICROSERVICES MICROSERVICES Wojciech Rzsa,

Microservices and OSGi running with Apache Karaf Agenda No free Lunch - microservices

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Prt t t

THE PRIVATE AND PUBLIC ECONOMICS OF ELECTRIC VEHICLES Erich Muehlegger, UC Davis and NBER David

CVD Risk Assessment and Prevention in 2019 and Beyond: How, When, and Why? Donald M.

Develop Your Data Mindset Module 5 - Universal Screening Part 3 - Analyze and Answer By Nathan

NEWS FLASH!! According to our very young Danny Yagan We are #1 in upward mobility

Browser Enhancements to Help Improve Page Load Performance Using Delta Delivery W3C Performance

Seer: Leveraging Big Data to Navigate The Increasing Complexity of Cloud Debugging Yu Gan, Meghna

Platform- -Based Synthesis for Based Synthesis for Platform Field Field- -Programmable