reactive design patterns for microservices on multicore
play

Reactive design patterns for microservices on multicore Reactive - PowerPoint PPT Presentation

Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2 1


  1. Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com

  2. Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2

  3. 1 MICROSERVICES ON MULTICORE 3

  4. Microservices on Multicore Microservice architecture with actor model µService µService Actor µService Message passing 4

  5. Microservices on Multicore Fast data means more inter-communication Batch Stream Computations Real time event processing Fast Data Highly Interconnected Workflows Communications 5

  6. Microservices on Multicore Microservice architecture µService µService Actor µService Message passing 6

  7. Microservices on Multicore Microservice architecture + Fast Data µService µService Actor µService Message passing New interactions 7

  8. Microservices on Multicore Microservice architecture + Fast Data µService µService Actor µService Message passing New interactions More interactions 8

  9. Microservices on Multicore More microservices should run on the same muticore machine machine 9

  10. Microservices on Multicore Microservice architecture + Fast Data + Multicore + machine µService µService Core 10

  11. Microservices on Multicore Microservice architecture + Fast Data + Multicore + machine Universal Law of Scalability (Gunther law) Performance model of a system based on queueing theory Perfect scalability ( N) Contention impact ( σ ) Coherency impact ( κ ) σ >>0, κ >0 σ >>0, κ =0 σ =0, κ =0 11

  12. Microservices on Multicore From inter-thread communications... Core 12

  13. Microservices on Multicore From inter-thread communications... Core 13

  14. Microservices on Multicore …to inter -core communications Core 14

  15. Microservices on Multicore Inter-core communication => cache coherency machine core 1 core N 1 cycle (0.3 ns) Registers Registers 4 cycles (1.3 ns) L1 I$ L1D$ L1 I$ L1D$ MESI > 600 600 cycles 12 cycles L2$ L2$ (200 ns) (4 ns) > 30 cycles > Shared L3$ or LLC (10 ns) Assuming Freq = 3 GHz 15

  16. Microservices on Multicore Exchange software are pushing performance to hardware limits machine 50%ile 99.99%ile Stability Velocity Volume k.msg/s M. msg/s msec µsec 16

  17. Simplx: one thread per core No context switching 17

  18. Simplx: actors multitasking per thread High core utilization 18

  19. Simplx: one event loop per core for communications Lock free Event loop = ~30 300 ns Event loop = ~30 300 ns ns Si Simplx runs on all cores 19

  20. Multicore WITHOUT multithreaded programming ? 20

  21. Microservices on Muticore Very good resources, but no multicore-related patterns machine 21

  22. 2 REACTIVE MULTICORE PATTERNS 22

  23. 23

  24. Reactive multicore Patterns 7 patterns to unleash multicore reactivity machine Core-to-core messaging (2 patterns) Core monitoring (2 patterns) Core-to-core flow control (1 pattern) Core-to-cache management (2 patterns) 24

  25. Core-to to-core messaging patterns 25

  26. Pattern #1: the core-aware messaging pattern Inter-core communication: push message ~1 µs – 10 µs Push message ~ 500 ns ~ 300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); sender destination core socket server 26

  27. Pattern #1: the core-aware messaging pattern Intra-core communication: push message ~300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); Push a message asynchronous 27

  28. Pattern #1: the core-aware messaging pattern Intra-core communication: x150 speedup with direct call over push ~300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); Push a message asynchronous ~2 ns ActorReference < GreenActor > target = getLocalReference ( greenActorId ); [...] target -> hello (); Direct call synchronous Optimize calls according to the deployment 28

  29. Pattern #2: the message mutualization pattern Network optimizations, core optimizations. Same fight In this use case, the 3 red consumers process the same data core actor push data 29

  30. Pattern #2: the message mutualization pattern Communication has a cost Many events means high cache coherence usage (L3) 3 events core actor push data 30

  31. Pattern #2: the message mutualization pattern Let’s mutualize inter-core communications 3 events 1 event 3 direct calls Local router 31

  32. Pattern #2: the message mutualization pattern WITH pattern vs WITHOUT pattern: Linear improvement 32

  33. Core monitoring patterns @ real-time 33

  34. Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput We want to know in real-time the number of messages received per second, globally, and per core. StartSequence startSequence ; startSequence . addActor < RedActor >(0); // core 0 startSequence . addActor < RedActor >(0); // core 0 startSequence . addActor < RedActor >(1); // core 1 startSequence . addActor < RedActor >(1); // core 1 Simplx simplx ( startSequence ); core actor core actor data 34

  35. Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput struct LocalMonitorActor : Actor { […] void newMessage () { ++ count ; 1 } } struct RedActor : Actor { […] ReferenceActor monitor ; RedActor () { monitor = newSingletonActor < LocalMonitorActor >(); } void onEvent () { monitor -> newMessage (); } } 1 Local monitoring Increase 1 Singleton message counter 35

  36. Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput struct LocalMonitorActor : Actor , TimerProxy { […] LocalMonitorActor : TimerProxy (*this) { setRepeat ( 1000 ); } virtual void onTimeout () { 1 serviceMonitoringPipe . push < StatsEvent >( count ); count = 0 ; } } 1 sec ec 1 Inform monitoring of Timer Service monitoring the last second statistics 36

  37. Pattern #4: the core usage pattern Core utilization Detect overloading cores before it is too late Relying on the CPU usage provided by the OS is not enough 100% does not mean the runtime is overloaded 10% does not tell how much data you can really process 37

  38. Pattern #4: the core usage pattern No push, no event, no work 1 sec ec 20 loops in a second 0% core usage Reality is more about 3 millions loops per second Idle loop 38

  39. Pattern #4: the core usage pattern Efficient core usage 1 sec ec 20 loops in a second 0% core usage 11 loops = 3 working loops 60% core usage Idle loop Working loop 39

  40. Pattern #4: the core usage pattern Runtime performance counters help measurement Duration(IdleLoop) = 0.05 s Reality is more about Duration Idle loop ~300 ns CoreUsage = 1 – ∑( idleLoop)*0.05 100 idleLoop= 0|1 0 1 0 0 0 1 0 0 1 0 0 11 loops 8 idle loops 3 working loops 1 sec ec 60% core usage Idle loop Working loop Core usage actor 40

  41. Demo: Real-time core monitoring A typical trading workflow Data stream Data processing 41

  42. Core-to to-core flow control patterns 42

  43. Pattern #5: the queuing prevention pattern What if producers overflow a consumer ? Your software cannot be more optimized ? Still, the incoming throughput could be too high, implying strong queuing. Continue ? Stop the flow ? Merge data ? Throttling ? Whatever the decision, we need to detect the issue 43

  44. Pattern #5: the queuing prevention pattern What’s happening behind a push ? 44

  45. Pattern #5: the queuing prevention pattern Local Simplx loops handle the inter-core communication Batc atch ID = 145 145 45

  46. Pattern #5: the queuing prevention pattern Once the destination reads the data, the BatchID is incremented Batc atch ID = 145 145 Batc atch ID = 146 146 46

  47. Pattern #5: the queuing prevention pattern BatchID does not increment if destination core is busy Batc atch ID = 145 145 Batc atch ID = 145 145 47

  48. Pattern #5: the queuing prevention pattern Core to core communication at max pace Batc atch ID = 145 145 Batc atch ID = 145 145 BatchID batchID ( pipe ); pipe . push < Event >(); ( … ) if(batchID.hasChanged()) { // push again } else { //destination is busy //merging data, start throttling, reject orders … } 48

  49. Pattern #5: the queuing prevention pattern Demo: code java Same id Last id => queuing => no queuing 49

  50. Core-to to-cache management patterns 50

  51. Pattern #6: the cache-aware split pattern FIX + execution engine new w or orde der 51

  52. Pattern #6: the cache-aware split pattern FIX + execution engine A FIX order can easily size ~ 200 Bytes new w or orde der ack cknowledgment Almost all tags sent in the new order request need to be sent back in the acknowledgment 52

  53. Pattern #6: the cache-aware split pattern Stability depends on the ability to be cache friendly 20 200 Bytes To stay « in-cache » and get stable per per order: performance, one core can store ~1300 open orders. 1  10000 open orders per book Local storage order book Local storage 53

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend