Lynx: Using OS and Hardware Support for Fast Fine-Grained - - PowerPoint PPT Presentation

lynx using os and hardware support for fast fine grained
SMART_READER_LITE
LIVE PREVIEW

Lynx: Using OS and Hardware Support for Fast Fine-Grained - - PowerPoint PPT Presentation

Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang and Timothy M. Jones Computer Laboratory UKMAC 2016, Edinburgh slide 1 of 30


slide-1
SLIDE 1

Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication

Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang and Timothy M. Jones

Computer Laboratory

UKMAC 2016, Edinburgh

slide 1 of 30 http://www.cl.cam.ac.uk/~km647/

slide-2
SLIDE 2

Outline

  • Background:
  • Lamport’s queue
  • Multi-section queue
  • Lynx queue
  • Performance evaluation

slide 2 of 30 http://www.cl.cam.ac.uk/~km647/

slide-3
SLIDE 3

Lamport’s Queue Bottlenecks

  • dequeue_ptr

enqueue_ptr

slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

slide-4
SLIDE 4

Lamport’s Queue Bottlenecks

  • dequeue_ptr

enqueue_ptr while(next enqueue ptr == dequeue ptr){; }

slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

slide-5
SLIDE 5

Lamport’s Queue Bottlenecks

  • dequeue_ptr

enqueue_ptr while(next enqueue ptr == dequeue ptr){; } Performance degradation due to:

slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

slide-6
SLIDE 6

Lamport’s Queue Bottlenecks

  • dequeue_ptr

enqueue_ptr while(next enqueue ptr == dequeue ptr){; } Performance degradation due to:

  • Frequent thread synchronisation

slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

slide-7
SLIDE 7

Lamport’s Queue Bottlenecks

  • dequeue_ptr

enqueue_ptr while(next enqueue ptr == dequeue ptr){; } Performance degradation due to:

  • Frequent thread synchronisation
  • Cache ping-pong

slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

slide-8
SLIDE 8

Cache Ping-Pong

L2 cache L1 cache

core 1

dequeue_ptr

L2 cache L1 cache

enqueue_ptr

core 2

L3 cache

while(next enqueue ptr == dequeue ptr){; }

slide 4 of 30 http://www.cl.cam.ac.uk/~km647/

slide-9
SLIDE 9

Cache Ping-Pong

L2 cache L1 cache

core 1

dequeue_ptr

L2 cache L1 cache

enqueue_ptr

core 2

L3 cache

while(next enqueue ptr == dequeue ptr){; }

  • Queue pointers ping-pong across cache

hierarchy

slide 4 of 30 http://www.cl.cam.ac.uk/~km647/

slide-10
SLIDE 10

Cache Ping-Pong

L2 cache L1 cache

core 1

dequeue_ptr

L2 cache L1 cache

enqueue_ptr

core 2

L3 cache

while(next dequeue ptr == enqueue ptr){; }

  • Queue pointers ping-pong across cache

hierarchy

slide 5 of 30 http://www.cl.cam.ac.uk/~km647/

slide-11
SLIDE 11

Multi-Section Queue(MSQ): state-of-the-art

dequeue_ptr enqueue_ptr section 1 section 2

slide 6 of 30 http://www.cl.cam.ac.uk/~km647/

slide-12
SLIDE 12

Multi-Section Queue(MSQ): state-of-the-art

  • dequeue_ptr

enqueue_ptr section 1 section 2

  • Each section is exclusively used by one thread

slide 6 of 30 http://www.cl.cam.ac.uk/~km647/

slide-13
SLIDE 13

Multi-Section Queue(MSQ): state-of-the-art

  • enqueue_ptr

dequeue_ptr section 1 section 2

  • Enqueue thread cannot access section 1

because dequeue thread still uses it

slide 7 of 30 http://www.cl.cam.ac.uk/~km647/

slide-14
SLIDE 14

Multi-Section Queue(MSQ): state-of-the-art

  • enqueue_ptr

dequeue_ptr section 1 section 2

  • Enqueue thread cannot access section 1

because dequeue thread still uses it

  • Enqueue thread waits (spins) at the end of

section 2

slide 7 of 30 http://www.cl.cam.ac.uk/~km647/

slide-15
SLIDE 15

Multi-Section Queue(MSQ): state-of-the-art

  • dequeue_ptr

enqueue_ptr section 1 section 2

  • Dequeue thread reached the end of section 1

slide 8 of 30 http://www.cl.cam.ac.uk/~km647/

slide-16
SLIDE 16

Multi-Section Queue(MSQ): state-of-the-art

  • dequeue_ptr

enqueue_ptr section 1 section 2

  • Dequeue thread reached the end of section 1
  • Enqueue thread enters section 1

slide 9 of 30 http://www.cl.cam.ac.uk/~km647/

slide-17
SLIDE 17

Multi-Section Queue(MSQ): state-of-the-art

  • dequeue_ptr

enqueue_ptr section 1 section 2

Performance optimisations:

slide 10 of 30 http://www.cl.cam.ac.uk/~km647/

slide-18
SLIDE 18

Multi-Section Queue(MSQ): state-of-the-art

  • dequeue_ptr

enqueue_ptr section 1 section 2

Performance optimisations:

  • Infrequent boundary checks (less frequent

synchronisation)

slide 10 of 30 http://www.cl.cam.ac.uk/~km647/

slide-19
SLIDE 19

Multi-Section Queue(MSQ): state-of-the-art

  • dequeue_ptr

enqueue_ptr section 1 section 2

Performance optimisations:

  • Infrequent boundary checks (less frequent

synchronisation)

  • Reduced cache ping-pong

slide 10 of 30 http://www.cl.cam.ac.uk/~km647/

slide-20
SLIDE 20

MSQ Control-Flow Graph and Internals

1 2 3 4 5

dequeue function

1 2 4 5 3

enqueue function

6

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-21
SLIDE 21

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-22
SLIDE 22

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-23
SLIDE 23

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code checks if next section is free

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-24
SLIDE 24

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue spin loop synchronisation code checks if next section is free

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-25
SLIDE 25

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue spin loop update local variables synchronisation code checks if next section is free

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-26
SLIDE 26

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue spin loop update local variables update shared variable synchronisation code checks if next section is free

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-27
SLIDE 27

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue spin loop update local variables update shared variable synchronisation code join basic−block checks if next section is free

slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

slide-28
SLIDE 28

MSQ Control-Flow Graph and Internals

  • dequeue_ptr

enqueue_ptr synchronisation code section 1 section 2

slide 12 of 30 http://www.cl.cam.ac.uk/~km647/

slide-29
SLIDE 29

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-30
SLIDE 30

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2 incr pointer

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-31
SLIDE 31

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2 store incr pointer

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-32
SLIDE 32

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2 compiler’s copy store incr pointer

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-33
SLIDE 33

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2 rotate pointer compiler’s copy store incr pointer

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-34
SLIDE 34

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2 end of section rotate pointer compiler’s copy store incr pointer

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-35
SLIDE 35

MSQ Control-Flow Graph and Internals

1 2 4 5 3

enqueue function

6

enqueue synchronisation code

lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2 skip sync code end of section rotate pointer compiler’s copy store incr pointer

slide 13 of 30 http://www.cl.cam.ac.uk/~km647/

slide-36
SLIDE 36

Optimal Queue

  • dequeue_ptr

enqueue_ptr

Optimal queue features:

  • infinite size

slide 14 of 30 http://www.cl.cam.ac.uk/~km647/

slide-37
SLIDE 37

Optimal Queue

  • dequeue_ptr

enqueue_ptr

Optimal queue features:

  • infinite size
  • 2 instructions overhead

1 pointer increment 2 store into the queue

slide 14 of 30 http://www.cl.cam.ac.uk/~km647/

slide-38
SLIDE 38

Lynx: Just 2 instructions overhead

1 2 4 5 3 6

enqueue function synchronisation code Lynx removes part of enqueue the critical path (boundary checks) and all the synchronisation overhead off

slide 15 of 30 http://www.cl.cam.ac.uk/~km647/

slide-39
SLIDE 39

Lynx(1): H/W triggered Synchronisation

1 2 4 5 3

enqueue function

6

enqueue synchronisation code lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2

slide 16 of 30 http://www.cl.cam.ac.uk/~km647/

slide-40
SLIDE 40

Lynx(1): H/W triggered Synchronisation

1 2 4 5 3

enqueue function

6

enqueue synchronisation code lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2

slide 16 of 30 http://www.cl.cam.ac.uk/~km647/

slide-41
SLIDE 41

Lynx(1): H/W triggered Synchronisation

1 2 4 5 3

enqueue function

6

enqueue synchronisation code lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2

slide 16 of 30 http://www.cl.cam.ac.uk/~km647/

slide-42
SLIDE 42

Lynx(1): H/W triggered Synchronisation

section 2 section 1

slide 17 of 30 http://www.cl.cam.ac.uk/~km647/

slide-43
SLIDE 43

Lynx(1): H/W triggered Synchronisation

  • section 2

section 1

  • A red zone is a non-read and non-write part of

memory

slide 17 of 30 http://www.cl.cam.ac.uk/~km647/

slide-44
SLIDE 44

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1

  • SSRZ: Section Synchronisation Red-Zone

slide 17 of 30 http://www.cl.cam.ac.uk/~km647/

slide-45
SLIDE 45

Lynx(1): H/W triggered Synchronisation

  • enqueue_ptr

dequeue_ptr SSRZ SSRZ section 2 section 1

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-46
SLIDE 46

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-47
SLIDE 47

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-48
SLIDE 48

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

Lynx’s handler checks:

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-49
SLIDE 49

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

Lynx’s handler checks:

  • whether the SIG SEGV is from the queue or

the system

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-50
SLIDE 50

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

Lynx’s handler checks:

  • whether the SIG SEGV is from the queue or

the system

  • which thread raised the exception

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-51
SLIDE 51

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

Lynx’s handler checks:

  • whether the SIG SEGV is from the queue or

the system

  • which thread raised the exception
  • if the thread is in section 1 or 2

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-52
SLIDE 52

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

Lynx’s handler checks:

  • whether the SIG SEGV is from the queue or

the system

  • which thread raised the exception
  • if the thread is in section 1 or 2
  • if the next section is free

slide 18 of 30 http://www.cl.cam.ac.uk/~km647/

slide-53
SLIDE 53

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 dequeue_ptr enqueue_ptr

  • The dequeue thread still uses the first section
  • The enqueue thread waits at the end of the

second section and it adds a new red zone

  • The new red zone is part of the synchronisation

and it is temporalily added

slide 19 of 30 http://www.cl.cam.ac.uk/~km647/

slide-54
SLIDE 54

Lynx(1): H/W triggered Synchronisation

  • SSRZ

SSRZ section 2 section 1 enqueue_ptr dequeue_ptr

  • The dequeue thread has finished with the first

section

  • The enqueue thread removes the second red

zone and it enters the first section

slide 20 of 30 http://www.cl.cam.ac.uk/~km647/

slide-55
SLIDE 55

Lynx(2): H/W triggered Pointer Rotation

1 2 4 5 3

enqueue function

6

enqueue synchronisation code lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2

slide 21 of 30 http://www.cl.cam.ac.uk/~km647/

slide-56
SLIDE 56

Lynx(2): H/W triggered Pointer Rotation

1 2 4 5 3

enqueue function

6

enqueue synchronisation code lea rax, [rdx+8] mov QWORD PTR [rdx], rcx mov rdx, rax and rdx, ROTATE MASK test eax, SECTION_MASK jne .L2

slide 21 of 30 http://www.cl.cam.ac.uk/~km647/

slide-57
SLIDE 57

Lynx(2): H/W triggered Pointer Rotation

  • enqueue_ptr

section 2 dequeue_ptr section 1 SSRZ SSRZ

  • SSRZ: Section Synchronisation Red-Zone

slide 22 of 30 http://www.cl.cam.ac.uk/~km647/

slide-58
SLIDE 58

Lynx(2): H/W triggered Pointer Rotation

  • enqueue_ptr

section 2 dequeue_ptr section 1 SSRZ SSRZ PRRZ

  • SSRZ: Section Synchronisation Red-Zone
  • PRRZ: Pointer Rotation Red-Zone

slide 22 of 30 http://www.cl.cam.ac.uk/~km647/

slide-59
SLIDE 59

Lynx(2): H/W triggered Pointer Rotation

  • enqueue_ptr

section 2 section 1 SSRZ SSRZ PRRZ dequeue_ptr

  • SSRZ: Section Synchronisation Red-Zone
  • PRRZ: Pointer Rotation Red-Zone

slide 22 of 30 http://www.cl.cam.ac.uk/~km647/

slide-60
SLIDE 60

Lynx(2): H/W triggered Pointer Rotation

  • enqueue_ptr

section 2 section 1 SSRZ SSRZ PRRZ dequeue_ptr

  • SSRZ: Section Synchronisation Red-Zone
  • PRRZ: Pointer Rotation Red-Zone

slide 22 of 30 http://www.cl.cam.ac.uk/~km647/

slide-61
SLIDE 61

Lynx(2): H/W triggered Pointer Rotation

  • section 2

section 1 SSRZ SSRZ PRRZ dequeue_ptr enqueue_ptr

  • SSRZ: Section Synchronisation Red-Zone
  • PRRZ: Pointer Rotation Red-Zone

slide 22 of 30 http://www.cl.cam.ac.uk/~km647/

slide-62
SLIDE 62

Lynx(2): H/W triggered Pointer Rotation

  • section 2

section 1 SSRZ SSRZ PRRZ dequeue_ptr enqueue_ptr

Two types of red-zones:

slide 23 of 30 http://www.cl.cam.ac.uk/~km647/

slide-63
SLIDE 63

Lynx(2): H/W triggered Pointer Rotation

  • section 2

section 1 SSRZ SSRZ PRRZ dequeue_ptr enqueue_ptr

Two types of red-zones:

1 moving red-zone: SSRZ (Section

Synchronisation Red-Zone)

slide 23 of 30 http://www.cl.cam.ac.uk/~km647/

slide-64
SLIDE 64

Lynx(2): H/W triggered Pointer Rotation

  • section 2

section 1 SSRZ SSRZ PRRZ dequeue_ptr enqueue_ptr

Two types of red-zones:

1 moving red-zone: SSRZ (Section

Synchronisation Red-Zone)

2 fixed red-zone: PRRZ (Pointer Rotation

Red-Zone)

slide 23 of 30 http://www.cl.cam.ac.uk/~km647/

slide-65
SLIDE 65

Experimental Setup

  • Implementation in C++ with inline assembly

slide 24 of 30 http://www.cl.cam.ac.uk/~km647/

slide-66
SLIDE 66

Experimental Setup

  • Implementation in C++ with inline assembly
  • Evaluation on a wide range of machines: from

embedded SOCs to server CPUs

slide 24 of 30 http://www.cl.cam.ac.uk/~km647/

slide-67
SLIDE 67

Experimental Setup

  • Implementation in C++ with inline assembly
  • Evaluation on a wide range of machines: from

embedded SOCs to server CPUs

  • Throughput experiments for a wide range of

queue sizes

slide 24 of 30 http://www.cl.cam.ac.uk/~km647/

slide-68
SLIDE 68

Experimental Setup

  • Implementation in C++ with inline assembly
  • Evaluation on a wide range of machines: from

embedded SOCs to server CPUs

  • Throughput experiments for a wide range of

queue sizes

  • Absolute throughput performance in GB/s

slide 24 of 30 http://www.cl.cam.ac.uk/~km647/

slide-69
SLIDE 69

Throughput (GB/s) on Intel core-i5

2 4 6 8 10 12 14 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB GB/s Queue size

Throughput for 64bit Memory Instr. (Core-i5 4570)

MSQ-mov Lynx-mov

slide 25 of 30 http://www.cl.cam.ac.uk/~km647/

slide-70
SLIDE 70

Breakdown of Lynx Overheads

10 20 30 40 50 60 70 80 90 100 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB % Execution Time

real kernel sync handler

  • ther

Queue Size

slide 26 of 30 http://www.cl.cam.ac.uk/~km647/

slide-71
SLIDE 71

Throughput (GB/s) on Various Machines

2 4 6 8 10 12 14 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB GB/s Queue size

Throughput for 64bit Memory Instr. (Xeon E5-2667v2)

MSQ-mov Lynx-mov

1 2 3 4 5 6 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB GB/s Queue size

Throughput for 64bit Memory Instr. (Opteron 6376)

MSQ-mov Lynx-mov

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB GB/s Queue size

Throughput for 64bit Memory Instr. (Core-i3 2367M)

MSQ-mov Lynx-mov

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB GB/s Queue size

Throughput for 64bit Memory Instr. (Celeron J1900)

MSQ-mov Lynx-mov

slide 27 of 30 http://www.cl.cam.ac.uk/~km647/

slide-72
SLIDE 72

Real World Applications on Intel Xeon

Queue Thread Main Thread Checker

... Queue Thread Code Instrum. Dispatch Threads Worker ... Analysis Dispatch Parser Packet Analysis Partial Main ... ... 0.80 0.90 1.00 1.10 1.20 1.30 1.40 BT CG EP IS LU MG SPGeo MSQ Lynx 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 BT CG EP IS LU MG SPGeo MSQ Lynx

0.96 0.98 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 1.18 1T 2T 3T 4T 5T 6T Geo

MSQ Lynx

SRMT SD3 NetworkAnalyser

  • The best queue configuration with Lynx is

better than the best with MSQ

slide 28 of 30 http://www.cl.cam.ac.uk/~km647/

slide-73
SLIDE 73

Conclusion

  • Proposed Lynx: a lock-free SP/SC software

queue with just 2 instructions overhead

slide 29 of 30 http://www.cl.cam.ac.uk/~km647/

slide-74
SLIDE 74

Conclusion

  • Proposed Lynx: a lock-free SP/SC software

queue with just 2 instructions overhead

  • Relies on existing commodity H/W and O/S

support for memory protection

slide 29 of 30 http://www.cl.cam.ac.uk/~km647/

slide-75
SLIDE 75

Conclusion

  • Proposed Lynx: a lock-free SP/SC software

queue with just 2 instructions overhead

  • Relies on existing commodity H/W and O/S

support for memory protection

  • The overhead of synchronisation and boundary

checking is moved to the exception handler

slide 29 of 30 http://www.cl.cam.ac.uk/~km647/

slide-76
SLIDE 76

Conclusion

  • Proposed Lynx: a lock-free SP/SC software

queue with just 2 instructions overhead

  • Relies on existing commodity H/W and O/S

support for memory protection

  • The overhead of synchronisation and boundary

checking is moved to the exception handler

  • Throughput increases by up to 57%

slide 29 of 30 http://www.cl.cam.ac.uk/~km647/

slide-77
SLIDE 77

Source Code

https://www.cl.cam.ac.uk/~km647/papers/ lynx/lynxQ.tar.bz2

  • r

https://www.repository.cam.ac.uk/handle/ 1810/254651

LYNX QUEUE

slide 30 of 30 http://www.cl.cam.ac.uk/~km647/