low latency trading architecture
play

Low Latency Trading Architecture Sam Adams QCon London, March, 2017 - PowerPoint PPT Presentation

Low Latency Trading Architecture Sam Adams QCon London, March, 2017 don't panic sell buy GBP/USD don't panic Typical day : 1,000's active clients 100,000's trades occur 100,000,000's orders placed very bursty: spikes of 100s / ms


  1. Low Latency Trading Architecture Sam Adams QCon London, March, 2017

  2. don't panic sell buy GBP/USD don't panic

  3. Typical day : 1,000's active clients 100,000's trades occur 100,000,000's orders placed – very bursty: spikes of 100s / ms 1,000,000,000's market data updates sent

  4. End-to-end latency: 50%: 80 µs 99%: 150 µs 99.99%: 500 µs Max: 4ms (*)

  5. System Architecture Building low latency applications

  6. * latency sensitive * Instruction Execution reports Market data * throughput matters *

  7. The Disruptor

  8. High performance inter-thread messaging Consumer Producer

  9. ArrayBlockingQueue vs Disruptor public class ArrayBlockingQueue<E> { final Object[] items ; int takeIndex ; int putIndex ; int count ; /** Main lock guarding all access */ final ReentrantLock lock ; } locking & contention

  10. ArrayBlockingQueue vs Disruptor public class RingBuffer<E> public class ArrayBlockingQueue<E> implements DataProvider<E> { { final Object[] items ; // ... int takeIndex ; final long indexMask ; int putIndex ; final Object[] entries ; int count ; final Sequence cursor ; // ... /** Main lock guarding all access */ } final ReentrantLock lock ; } public class BatchEventProcessor<E> { locking & contention final DataProvider<E> dataProvider ; final Sequence sequence ; vs } single writers

  11. Claimed: -1 Published: -1 Consumer Producer Consumed: -1 Waiting for: 0

  12. Claimed: 0 Published: -1 Consumer Producer Consumed: -1 Claim slot: 0 Waiting for: 0

  13. Claimed: 0 Published: 0 Consumer Producer Consumed: -1 Publish slot: 0 Waiting for: 0

  14. Claimed: 0 Published: 0 Consumer Producer Consumed: -1 Available: 0 Processing: 0

  15. Claimed: 0 Published: 0 Consumer Producer Consumed: 0 Waiting for: 1

  16. Claimed: 3 Published: 3 Consumer Producer Consumed: 0 Published: 1-3 Waiting for: 1

  17. Claimed: 3 Published: 3 Consumer Producer Consumed: 0 Available: 3 Processing: 1,2,3

  18. Claimed: 3 Published: 3 Consumer Producer Consumed: 3 Waiting for: 4

  19. Supports dependency graphs between consumers

  20. Messaging

  21. Asynchronous Pub/Sub messaging: - UDP Multicast: low latency, scalable, unreliable - Services publish / subscribe to topics - topic = unique multicast group - Informatica UMS (aka 29 West LBM) provides * some reliability *

  22. Asynchronous Pub/Sub messaging: - Push based - If you miss a message, it is gone - Late-join: no history

  23. Event: long sequence byte operationIndex byte [] data javassist generated proxies to interfaces int length public interface TradingInstructions { void placeOrder(PlaceOrderInstruction instruction); void cancelOrder(CancelOrderInstruction instruction); } See GeneratedRingBufferProxyGenerator in disruptor-proxy for inter-thread version https://github.com/LMAX-Exchange/disruptor-proxy

  24. Event: long sequence byte operationIndex byte [] data int length Publisher proxy: public void placeOrder(PlaceOrderInstruction arg0) { // ... event.initialise(sequence, 1); // operation index marshaller .encode(arg0, event.outputStream()); // ... } See GeneratedRingBufferProxyGenerator in disruptor-proxy for inter-thread version https://github.com/LMAX-Exchange/disruptor-proxy

  25. Event: long sequence Subscriber proxy: byte operationIndex byte [] data Invoker invokers[]; int length TradingInstructions implementation; public void onEvent(Event event) { Invoker invoker = invokers [event.getOperationIndex()]; invoker.invoke(event.getInputStream(), implementation ); } public void invoke(InputStream input, TradingInstructions implementation) { PlaceOrderInstruction arg0 = marshaller .decode(input); implementation.placeOrder(arg0); } See GeneratedRingBufferProxyGenerator in disruptor-proxy for inter-thread version https://github.com/LMAX-Exchange/disruptor-proxy

  26. Matching Engine

  27. For speed: All working state held in memory Remove contention: single threaded

  28. Don’t block business logic: buffer for outbound I/O

  29. Don’t block network thread: buffer incoming events

  30. All state in volatile memory: Save on shutdown / Load on startup

  31. Recover from unclean shutdown Journal incoming events to disk, replay on startup

  32. Replicate events to hot-standby for resiliency Manual fail-over (also to offsite DR)

  33. Holding all your state in memory No database No roll-back Up-front validation is critical Never throw exceptions - result is inconsistent state

  34. System must be deterministic All operations event sourced time sourced from events collections must be ordered no local configuration

  35. Determinism bugs are really nasty Only an issue if we have to fail-over or replay Primary is the source of truth

  36. Gateways

  37. Same principles: - non-blocking / message passing - minimise shared state

  38. Stream Processing

  39. Matching Engine Order Book

  40. All Orders[ ] Matching Engine Order Added Order Cancelled Order Book Order Added Trade Trade Order Added ...

  41. Event Store All Orders[ ] Matching Engine Order Added Market Analysis Order Cancelled Order Book Order Added Order Book Image Trade Trade Order Added AML Alerts ... Order Book Image

  42. Event Store Where latency doesn’t matter... Market Analysis - How big are the bursts? Order Book Image - Buffers are your friend Does data loss matter? AML Alerts Order Book Image

  43. More Reliable Messaging

  44. Handling buffer wraps ‘better never than late’ - reset & late join persistent data loss - recover from event store - journal replay and gap-fill

  45. Low latency applications: mechanical sympathy

  46. [sam@box ~]$ lstopo

  47. Machine (126GB) Main Memory NUMANode P#0 (63GB) Socket P#0 L3 (30MB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L1/L2 Caches L1d (32KB) L1d (32KB) L1d (32KB) L1d (32KB) L1i (32KB) L1i (32KB) L1i (32KB) L1i (32KB) Core P#0 Core P#1 Core P#2 Core P#3 PU P#0 PU P#2 PU P#4 PU P#6 CPU Core / Hyper Threads PU P#24 PU P#26 PU P#28 PU P#30

  48. Machine (126GB) CPUs are faster than memory NUMANode P#0 (63GB) Socket P#0 Intel Performance Analysis Guide: L3 (30MB) L1 CACHE hit, 4 cycles L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 CACHE hit, 10 cycles local L3 CACHE hit, ~40-75 cycles L1d (32KB) L1d (32KB) L1d (32KB) L1d (32KB) remote L3 CACHE hit, ~100-300 cycles L1i (32KB) L1i (32KB) L1i (32KB) L1i (32KB) Core P#0 Core P#1 Core P#2 Core P#3 Local Dram ~60 ns PU P#0 PU P#2 PU P#4 PU P#6 Remote Dram ~100 ns PU P#24 PU P#26 PU P#28 PU P#30

  49. Memory system optimised for: Temporal locality Spatial locality Equidistant locality

  50. Reference vs Primitives Long[] vs long[]

  51. Calculations with money - double : inexact - BigDecimal : expensive Fixed-point arithmetic with long public class Cash But I want type-safety... { long value ; }

  52. Prices, precision: 6dp long price1 = 1250000L; 1250000L → 1.250000 long quantity1 = 1520L; // BUG Quantities, precision: 2dp long price2 = quantity1; 1520L → 15.20

  53. With Type Annotations & Units Checker: Prices, precision: 6dp @Price long price1 = 1250000L; 1250000L → 1.250000 @Qty long quantity1 = 1520L; Quantities, precision: 2dp // Compilation error @Price long price2 = quantity1; 1520L → 15.20 https://checkerframework.org/

  54. java.util vs fastutil Map<Long ,X> vs LongMap< X> public class HashMap<K,V> public class Long2ObjectOpenHashMap<V> { { long [] keys ; Node<K,V>[] table ; V[] values ; } static class Node<K,V> { K key ; V value ; Node<K,V> next ; }

  55. java.util vs fastutil Map<Long ,X> vs LongMap< X> public class HashMap<K,V> public class Long2ObjectOpenHashMap<V> { { long [] keys ; Node<K,V>[] table ; V[] values ; } static class Node<K,V> { K key ; V value ; Node<K,V> next ; }

  56. False sharing: revisit the Disruptor public class ArrayBlockingQueue<E> { final Object[] items ; int takeIndex ; int putIndex ; int count ; /** Main lock guarding all access */ final ReentrantLock lock ; }

  57. False sharing: revisit the Disruptor public class RingBuffer public class Sequence { { // ... long p1, p2, p3, p4, p5, p6, p7; final Object[] entries ; long value ; final Sequence cursor ; long p9, p10, p11, p12, p13, p14, p15; // ... } }

  58. False sharing: revisit the Disruptor Java 8: public class RingBuffer public class Sequence { { // ... @Contended final Object[] entries ; long value ; final Sequence cursor ; } // ... }

  59. Removing Jitter: GC & Scheduling

  60. GC Options: Zero garbage Massive heap, GC when convenient Commercial JVM – Azul Zing

  61. GC Options: Zero garbage Massive heap, GC when convenient Commercial JVM – Azul Zing

  62. GC Options: Zero garbage Massive heap, GC when convenient Commercial JVM – Azul Zing

  63. Avoiding scheduling jitter OS JVM

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend