Aeron High-Performance Open Source Message Transport Martin - - PowerPoint PPT Presentation

aeron
SMART_READER_LITE
LIVE PREVIEW

Aeron High-Performance Open Source Message Transport Martin - - PowerPoint PPT Presentation

Aeron High-Performance Open Source Message Transport Martin Thompson - @mjpt777 1. Why build another Product ? 2. What Features are really needed? 3. How does one Design for this? 4. What did we Learn on the way? 5. Whats the Roadmap ? 1.


slide-1
SLIDE 1

Aeron

High-Performance Open Source Message Transport

Martin Thompson - @mjpt777

slide-2
SLIDE 2
slide-3
SLIDE 3
  • 1. Why build another Product?
  • 2. What Features are really needed?
  • 3. How does one Design for this?
  • 4. What did we Learn on the way?
  • 5. What’s the Roadmap?
slide-4
SLIDE 4
  • 1. Why build another

product?

slide-5
SLIDE 5

Not Invented Here!

slide-6
SLIDE 6

There’s a story here...

slide-7
SLIDE 7

Matching or Trading Engine Matching or Trading Engine Gateway

Clients

Gateway Gateway Gateway Gateway Gateway

slide-8
SLIDE 8

But many others could benefit

slide-9
SLIDE 9

Feature Bloat & Complexity

slide-10
SLIDE 10

Not Fast Enough

slide-11
SLIDE 11

Low-Latency is key

slide-12
SLIDE 12

We are in a new world

Multi-core, Multi-socket, Cloud...

slide-13
SLIDE 13

We are in a new world

UDP, IPC, InfiniBand, RDMA, PCI-e Multi-core, Multi-socket, Cloud...

slide-14
SLIDE 14

Aeron is trying a new approach

slide-15
SLIDE 15

The Team

Todd Montgomery Richard Warburton Martin Thompson

slide-16
SLIDE 16
  • 2. What features

are really needed?

slide-17
SLIDE 17

Publishers Subscribers Channel Stream

Messaging

Channel

slide-18
SLIDE 18

A library, not a framework, on which other abstractions and applications can be built

slide-19
SLIDE 19

Composable Design

slide-20
SLIDE 20

OSI layer 4 Transport for message oriented streams

slide-21
SLIDE 21

OSI Layer 4 (Transport) Services

  • 1. Connection Oriented Communication
  • 2. Reliability
  • 3. Flow Control
  • 4. Congestion Avoidance/Control
  • 5. Multiplexing
slide-22
SLIDE 22

Connection Oriented Communication

slide-23
SLIDE 23

Reliability

slide-24
SLIDE 24

Flow Control

slide-25
SLIDE 25

Congestion Avoidance/Control

slide-26
SLIDE 26

Multiplexing

slide-27
SLIDE 27

Multi-Everything World!

slide-28
SLIDE 28

Publishers Subscribers Channel Stream

Multi-Everything World

slide-29
SLIDE 29

Endpoints that scale

slide-30
SLIDE 30
  • 3. How does one

design for this?

slide-31
SLIDE 31

Design Principles

  • 1. Garbage free in steady state running
  • 2. Smart Batching in the message path
  • 3. Wait-free algos in the message path
  • 4. Non-blocking IO in the message path
  • 5. No exceptional cases in message path
  • 6. Apply the Single Writer Principle
  • 7. Prefer unshared state
  • 8. Avoid unnecessary data copies
slide-32
SLIDE 32

It’s all about 3 things

slide-33
SLIDE 33

It’s all about 3 things

  • 1. System Architecture
slide-34
SLIDE 34

It’s all about 3 things

  • 1. System Architecture
  • 2. Data Structures
slide-35
SLIDE 35

It’s all about 3 things

  • 1. System Architecture
  • 2. Data Structures
  • 3. Protocols of Interaction
slide-36
SLIDE 36

Publisher Subscriber Subscriber Publisher

Architecture

IPC Log Buffer

slide-37
SLIDE 37

Sender Receiver Receiver Sender Publisher Subscriber Subscriber Publisher

Architecture

Media IPC Log Buffer Media (UDP, InfiniBand, PCI-e 3.0)

slide-38
SLIDE 38

Conductor Sender Receiver Conductor Receiver Sender Publisher Subscriber Subscriber Publisher Admin Events

Architecture

Admin Events Media IPC Log Buffer Media (UDP, InfiniBand, PCI-e 3.0) Function/Method Call Volatile Fields & Queues

slide-39
SLIDE 39

Client Media Driver Media Driver Conductor Sender Receiver Conductor Receiver Sender Client Publisher Conductor Conductor Subscriber Subscriber Publisher Admin Events

Architecture

Admin Events Media IPC Log Buffer IPC Ring/Broadcast Buffer Media (UDP, InfiniBand, PCI-e 3.0) Function/Method Call Volatile Fields & Queues

slide-40
SLIDE 40

Data Structures

  • Maps
  • IPC Ring Buffers
  • IPC Broadcast Buffers
  • ITC Queues
  • Dynamic Arrays
  • Log Buffers
slide-41
SLIDE 41

Creates a

replicated persistent log

  • f messages

What does Aeron do?

slide-42
SLIDE 42

How would you design a log?

slide-43
SLIDE 43

Tail File

slide-44
SLIDE 44

Tail File Message 1 Header

slide-45
SLIDE 45

Tail File Message 1 Header Message 2 Header

slide-46
SLIDE 46

Tail File Message 1 Header Message 2 Header

slide-47
SLIDE 47

Tail File Message 1 Header Message 2 Header Message 3

slide-48
SLIDE 48

Tail File Message 1 Header Message 2 Header Message 3 Header

slide-49
SLIDE 49

Persistent data structures can be safe to read without locks

slide-50
SLIDE 50

One big file that goes on forever?

slide-51
SLIDE 51

No!!!

Page faults, page cache churn, VM pressure, ...

slide-52
SLIDE 52

Active Dirty Clean Tail Message Header Message Header Message Header Message Header Message Header Message Header Message Header Message Header

slide-53
SLIDE 53

How do we stay “wait-free”?

slide-54
SLIDE 54

Tail File Message 1 Header Message 2 Header Message 3 Header Message X Message Y

slide-55
SLIDE 55

Tail File Message 1 Header Message 2 Header Message 3 Header Message X Message Y

slide-56
SLIDE 56

Tail File Message 1 Header Message 2 Header Message 3 Header Message X Message Y

slide-57
SLIDE 57

Tail File Message 1 Header Message 2 Header Message 3 Header Message X Message Y Header

slide-58
SLIDE 58

Tail File Message 1 Header Message 2 Header Message 3 Header Message X Message Y Header Padding

slide-59
SLIDE 59

Tail File Message 1 Header Message 2 Header Message 3 Header Message Y Header Padding File Message X

slide-60
SLIDE 60

Tail File Message 1 Header Message 2 Header Message 3 Header Message X Message Y Header Padding File Header

slide-61
SLIDE 61

What’s in a header?

slide-62
SLIDE 62

Data Message Header

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version |B|E| Flags | Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+ |R| Frame Length | +-+-------------------------------------------------------------+ |R| Term Offset | +-+-------------------------------------------------------------+ | Session ID | +---------------------------------------------------------------+ | Stream ID | +---------------------------------------------------------------+ | Term ID | +---------------------------------------------------------------+ | Encoded Message ... ... | +---------------------------------------------------------------+

slide-63
SLIDE 63

Unique identification of a byte within each stream across time (streamId, sessionId, termId, termOffset)

slide-64
SLIDE 64

How do we replicate a log?

slide-65
SLIDE 65

We need a protocol of messages

slide-66
SLIDE 66

Sender Receiver

slide-67
SLIDE 67

Receiver Setup Sender

slide-68
SLIDE 68

Sender Status Receiver

slide-69
SLIDE 69

Data Data Status Sender Receiver

slide-70
SLIDE 70

Data Data Heartbeat Sender Receiver

slide-71
SLIDE 71

Data Data Heartbeat NAK Sender Receiver

slide-72
SLIDE 72

How are message streams reassembled?

slide-73
SLIDE 73

High Water Mark File Completed

slide-74
SLIDE 74

High Water Mark File Message 1 Header Completed

slide-75
SLIDE 75

High Water Mark File Message 1 Header Message 3 Header Completed

slide-76
SLIDE 76

File Message 1 Header Message 2 Header Message 3 Header Completed High Water Mark

slide-77
SLIDE 77

What if a gap is never filled?

slide-78
SLIDE 78

How do we know what is consumed?

slide-79
SLIDE 79

Publishers, Senders, Receivers, and Subscribers all keep position counters

slide-80
SLIDE 80

Counters are the key to flow control and monitoring

slide-81
SLIDE 81

Protocols can be more subtle than you think…

slide-82
SLIDE 82

What about “Self similar behaviour”?

slide-83
SLIDE 83
  • 4. What did we learn
  • n the way?
slide-84
SLIDE 84

Humans suck at estimation!!!

slide-85
SLIDE 85

Building distributed systems is Hard!

slide-86
SLIDE 86

We have more defensive code than feature code

slide-87
SLIDE 87

This does not mean the code is riddled with exception handlers – Yuk!!!

slide-88
SLIDE 88

Building distributed systems is Rewarding!

slide-89
SLIDE 89

Monitoring and Debugging

slide-90
SLIDE 90

Loss, throughput, and buffer size are all strongly related!!!

slide-91
SLIDE 91

Pro Tip:

Know your OS network parameters and how to tune them

slide-92
SLIDE 92

We can track application consumption – No need for the Disruptor

slide-93
SLIDE 93

Some parts of Java really suck!

slide-94
SLIDE 94

Some parts of Java really suck!

Unsigned Types?

slide-95
SLIDE 95

Some parts of Java really suck!

Unsigned Types? NIO (most of) - Locks

slide-96
SLIDE 96

Some parts of Java really suck!

Unsigned Types? NIO (most of) - Locks Off-heap, PAUSE, Signals, etc.

slide-97
SLIDE 97

Some parts of Java really suck!

Unsigned Types? String Encoding NIO (most of) - Locks Off-heap, PAUSE, Signals, etc.

slide-98
SLIDE 98

Some parts of Java really suck!

Unsigned Types? String Encoding NIO (most of) - Locks Off-heap, PAUSE, Signals, etc. Managing External Resources

slide-99
SLIDE 99

Some parts of Java really suck!

Unsigned Types? Off-heap, PAUSE, Signals, etc. Selectors - GC String Encoding NIO (most of) - Locks Managing External Resources

slide-100
SLIDE 100

Bytes!!!

public void main(final String[] args) { byte a = 0b0000_0001; byte b = 0b0000_0010; byte flags = a | b; System.out.printf( "flags=%s\n", Integer.toBinaryString(flags)); }

slide-101
SLIDE 101

Bytes!!!

public void main(final String[] args) { byte a = 0b0000_0001; byte b = 0b0000_0010; byte flags = a | b; System.out.printf( "flags=%s\n", Integer.toBinaryString(flags)); }

slide-102
SLIDE 102

Bytes!!!

public void main(final String[] args) { byte a = 0b0000_0001; byte b = 0b0000_0010; byte flags = a | b; System.out.printf( "flags=%s\n", Integer.toBinaryString(flags)); }

slide-103
SLIDE 103

Some parts of Java are really nice!

slide-104
SLIDE 104

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram

slide-105
SLIDE 105

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram Lambdas & Method Handles

slide-106
SLIDE 106

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram Bytecode Instrumentation Lambdas & Method Handles

slide-107
SLIDE 107

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram Bytecode Instrumentation Unsafe!!! + Java 8 Lambdas & Method Handles

slide-108
SLIDE 108

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram Bytecode Instrumentation Lambdas & Method Handles The Optimiser Unsafe!!! + Java 8

slide-109
SLIDE 109

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram Bytecode Instrumentation Lambdas & Method Handles The Optimiser – Love/Hate Unsafe!!! + Java 8

slide-110
SLIDE 110

Some parts of Java are really nice!

Tooling – IDEs, Gradle, HdrHistogram Bytecode Instrumentation Garbage Collection!!! Lambdas & Method Handles Unsafe!!! + Java 8 The Optimiser – Love/Hate

slide-111
SLIDE 111
  • 5. What’s the Roadmap?
slide-112
SLIDE 112

We are major feature complete!

slide-113
SLIDE 113

Just finished Profiling and Tuning

slide-114
SLIDE 114

Things are looking very good

slide-115
SLIDE 115

20 Million 40 byte messages per second!!!

slide-116
SLIDE 116 5034 5330 4596 2677 2645 2362 2184 2577 1106 1028 1311 1423 1014 654 612 612 579 622 313 313 316 327 301 151 156 159 152 163 75 75 78 82 82 42 29 38 39 42 18 18 20 21 20 9 8 10 11 9 6 4 4 5 5 3 2 2 3 2 2 1 1 1 1 1 1 1 1 1 1

1 10 100 1000 10000 7.0 7.1 7.2 7.2 7.2 7.3 7.4 9.2 9.6 9.9 10.5 11.0 11.2 11.7 12.2 12.5 13.3 13.9 14.5 15.8 18.1 18.8 19.5 20.9 23.9 26.7 37.9

Latency Distribution (µs)

slide-117
SLIDE 117
slide-118
SLIDE 118

C++ Port coming next

slide-119
SLIDE 119

Then IPC and Infiniband

slide-120
SLIDE 120

Have discussed FPGA implementations with 3rd Parties

slide-121
SLIDE 121

In closing…

slide-122
SLIDE 122
slide-123
SLIDE 123

https://github.com/real-logic/Aeron

Where can I find it?

slide-124
SLIDE 124

Blog: http://mechanical-sympathy.blogspot.com/ Twitter: @mjpt777

“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction.”

  • Albert Einstein

Questions?