[PPT] - Predictable Latency Adventures in Java Concurrency Martin Thompson PowerPoint Presentation

SLIDE 1

A Quest for Predictable Latency

Adventures in Java Concurrency

Martin Thompson - @mjpt777

SLIDE 2

SLIDE 3

SLIDE 4

SLIDE 5

If a system does not respond in a timely manner then it is effectively unavailable

SLIDE 6

1. It’s all about the Blocking
2. What do we mean by Latency
3. Adventures with Locks and queues
4. Some Alternative FIFOs
5. Where can we go Next

SLIDE 7

1. It’s all about the

Blocking

SLIDE 8

What is preventing progress?

SLIDE 9

A thread is blocked when it cannot make progress

SLIDE 10

There are two major causes of blocking

SLIDE 11

Blocking

Systemic Pauses
JVM Safepoints (GC, etc.)
Transparent Huge Pages (Linux)
Hardware (C-States, SMIs)

SLIDE 12

Blocking

Concurrent Algorithms
Notifying Completion
Mutual Exclusion (Contention)
Synchronisation / Rendezvous
Systemic Pauses
JVM Safepoints (GC, etc.)
Transparent Huge Pages (Linux)
Hardware (C-States, SMIs)

SLIDE 13

Call into kernel on contention
Always blocking
Difficult to get right
Execute in user space
Can be non-blocking
Very difficult to get right

Atomic/CAS Instructions Locks

Concurrent Algorithms

SLIDE 14

2. What do we mean by

Latency

SLIDE 15

Queuing Theory

SLIDE 16

Queuing Theory

Service Time

SLIDE 17

Queuing Theory

Latent/Wait Time Service Time

SLIDE 18

Queuing Theory

Service Time Response Time Latent/Wait Time

SLIDE 19

Queuing Theory

Service Time Response Time

Dequeue Enqueue

Latent/Wait Time

SLIDE 20

Queuing Theory

0.0 2.0 4.0 6.0 8.0 10.0 12.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Response Time Utilisation

SLIDE 21

Amdahl’s Law

2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024

Speedup Processors

Amdahl

SLIDE 22

Universal Scalability Law

2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024

Speedup Processors

Amdahl USL

SLIDE 23

3. Adventures with

Locks and Queues?

SLIDE 24

Some queue implementations spend more time queueing to be enqueued than in queue time!!!

SLIDE 25

The evils of Blocking

SLIDE 26

Queue.put() && Queue.take()

SLIDE 27

Condition Variables

SLIDE 28

Condition Variable RTT

Echo Service Histogram Record Signal Ping Signal Pong

SLIDE 29

µs

SLIDE 30

Max = 5525.503µs µs

SLIDE 31

Bad News

That’s the best case scenario!

SLIDE 32

Are non-blocking APIs any better?

SLIDE 33

Queue.offer() && Queue.poll()

SLIDE 34

https://github.com/real-logic/benchmarks Java 8u60 Ubuntu 15.04 – “performance” mode Intel i7-3632QM – 2.2 GHz (Ivy Bridge) Context for the Benchmarks

SLIDE 35

Prod # Mean 99% Baseline (Wait-free) 1 167 189 ArrayBlockingQueue 1 645 1,210 2 1,984 11,648 3 5,257 19,680 LinkedBlockingQueue 1 461 740 2 1,197 7,320 3 2,010 15,152 ConcurrentLinkedQueue 1 281 361 2 381 559 3 444 705

Burst Length = 1: RTT (ns)

SLIDE 36

Prod # Mean 99% Baseline (Wait-free) 1 721 982 ArrayBlockingQueue 1 30,346 41,728 2 35,631 65,008 3 50,271 90,240 LinkedBlockingQueue 1 31,422 41,600 2 45,096 94,208 3 89,820 180,224 ConcurrentLinkedQueue 1 12,916 15,792 2 25,132 35,136 3 39,462 56,768

Burst Length = 100: RTT (ns)

SLIDE 37

Prod # Mean 99% Baseline (Wait-free) 1 721 982 ArrayBlockingQueue 1 30,346 41,728 2 35,631 65,008 3 50,271 90,240 LinkedBlockingQueue 1 31,422 41,600 2 45,096 94,208 3 89,820 180,224 ConcurrentLinkedQueue 1 12,916 15,792 2 25,132 35,136 3 39,462 56,768

Burst Length = 100: RTT (ns)

SLIDE 38

Backpressure? Size methods? Flow Rates? Garbage? Fan out?

SLIDE 39

5. Some alternative

FIFOs

SLIDE 40

Inter-Thread FIFOs

SLIDE 41

Disruptor 1.0 – 2.0

claimSequence <<CAS>> cursor gating Reference Ring Buffer

Influences: Lamport + Network Cards

SLIDE 42

Disruptor 1.0 – 2.0

claimSequence <<CAS>> cursor gating Reference Ring Buffer

Influences: Lamport + Network Cards

1

SLIDE 43

Disruptor 1.0 – 2.0

claimSequence <<CAS>> cursor gating Reference Ring Buffer

Influences: Lamport + Network Cards

1

SLIDE 44

Disruptor 1.0 – 2.0

claimSequence <<CAS>> cursor gating Reference Ring Buffer

Influences: Lamport + Network Cards

1 1

SLIDE 45

Disruptor 1.0 – 2.0

claimSequence <<CAS>> cursor gating Reference Ring Buffer

Influences: Lamport + Network Cards

1 1

SLIDE 46

Disruptor 1.0 – 2.0

claimSequence <<CAS>> cursor gating Reference Ring Buffer

Influences: Lamport + Network Cards

1 1 1

SLIDE 47

long expectedSequence = claimedSequence - 1; while (cursor != expectedSequence) { // busy spin } cursor = claimedSequence;

SLIDE 48

long expectedSequence = claimedSequence - 1; while (cursor != expectedSequence) { // busy spin } cursor = claimedSequence;

SLIDE 49

Disruptor 3.0

gatingCache cursor <<CAS>> gating Reference Ring Buffer available