Read-Copy-Update for Read-Copy-Update for HelenOS HelenOS - - PowerPoint PPT Presentation

read copy update for read copy update for helenos helenos
SMART_READER_LITE
LIVE PREVIEW

Read-Copy-Update for Read-Copy-Update for HelenOS HelenOS - - PowerPoint PPT Presentation

Read-Copy-Update for Read-Copy-Update for HelenOS HelenOS http://d3s.mff.cuni.cz Martjn Dck decky@d3s.mff.cuni.cz CHARLES UNIVERSITY IN PRAGUE faculty of mathematjcs and physics faculty of mathematjcs and physics Introductjon


slide-1
SLIDE 1

Martjn Děcký

decky@d3s.mff.cuni.cz

http://d3s.mff.cuni.cz CHARLES UNIVERSITY IN PRAGUE faculty of mathematjcs and physics faculty of mathematjcs and physics

Read-Copy-Update for HelenOS Read-Copy-Update for HelenOS

slide-2
SLIDE 2

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 2

Introductjon Introductjon

HelenOS

Microkernel multjserver operatjng system Relying on asynchronous IPC mechanism

Major motjvatjon for scalable concurrent algorithms and data structures

Martjn Děcký

Researcher in computer science (operatjng systems) Not an expert on concurrent algorithms

But very lucky to be able to cooperate with hugely talented people in this area

slide-3
SLIDE 3

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 3

In a Nutshell In a Nutshell

Asynchronous IPC = Communicatjng partjes may access the communicatjon facilitjes concurrently

slide-4
SLIDE 4

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 4

In a Nutshell In a Nutshell

Asynchronous IPC = Communicatjng partjes may access the communicatjon facilitjes concurrently

→ The state of the shared communicatjon facilitjes needs to be protected by explicit synchronizatjon means

slide-5
SLIDE 5

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 5

In a Nutshell In a Nutshell

Asynchronous IPC = Communicatjng partjes have to access the communicatjon facilitjes concurrently

slide-6
SLIDE 6

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 6

In a Nutshell In a Nutshell

Asynchronous IPC = Communicatjng partjes have to access the communicatjon facilitjes concurrently

← In order to counterweight the overhead of the communicatjon by doing

  • ther useful work while waitjng for a reply
slide-7
SLIDE 7

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 7

In a Nutshell In a Nutshell

Asynchronous IPC → Communicatjon facilitjes have to use concurrency-friendly (non-blocking) synchronizatjon means

slide-8
SLIDE 8

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 8

In a Nutshell In a Nutshell

Asynchronous IPC → Communicatjon facilitjes have to use concurrency-friendly (non-blocking) synchronizatjon means

← In order to avoid limitjng the achievable degree of concurrency

slide-9
SLIDE 9

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 9

Basic Synchronizatjon Taxonomy Basic Synchronizatjon Taxonomy

For accessing shared data structures Mutual exclusion synchronizatjon

Temporal separatjon of scheduling entjtjes Typical means

Disabling preemptjon, Dekker's algorithm, direct use of atomic test-and-set operatjons, etc.

Typical mechanisms

Locks, semaphores, conditjon variables, etc. [+] Relatjvely intuitjve semantjcs, well-known characteristjcs [-] Overhead, restrictjon of concurrency, deadlocks

slide-10
SLIDE 10

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 10

Mutual Exclusion Synchronizatjon Mutual Exclusion Synchronizatjon

slide-11
SLIDE 11

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 11

Basic Synchronizatjon Taxonomy Basic Synchronizatjon Taxonomy

Non-blocking synchronizatjon

Replace temporal separatjon by sophistjcated means that guarantee logical consistency Typical means

Atomic writes, direct use of atomic read-modify-write operatjons, etc.

Typical mechanisms

Transactjonal memory, hazard pointers, Read-Copy-Update, etc.

[+] Reasonable (almost no) overhead and restrictjon of concurrency in favorable cases, guarantee of progress [-] Less intuitjve semantjcs, sometjmes non-trivial characteristjcs, non-favorable cases, livelocks

slide-12
SLIDE 12

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 12

Non-blocking Synchronizatjon Non-blocking Synchronizatjon

slide-13
SLIDE 13

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 13

Non-blocking Synchronizatjon Non-blocking Synchronizatjon

Wait-freedom

Guaranteed system-wide progress and starvatjon-freedom (all operatjons are fjnitely bounded) Wait-freedom algorithms always exist [1], but the performance of general methods is usually inferior to blocking algorithms Wait-free queue by Kogan & Petrank [2]

Lock-freedom

Guaranteed system-wide progress, but individual threads can starve Four phases: Data operatjon, assistjng obstructjon, abortjng obstructjon, waitjng

Obstructjon-freedom

Guaranteed single thread progress if isolated for a bounded tjme (obstructjng threads need to be suspended)

slide-14
SLIDE 14

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 14

From Means to Mechanism From Means to Mechanism

Synchronizatjon means

Individual instance of usage

Synchronizatjon mechanism

Generic reusable patuern

slide-15
SLIDE 15

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 15

From Means to Mechanism From Means to Mechanism

Synchronizatjon means

Individual instance of usage E.g. non-blocking list implementatjon using atomic pointer writes

Synchronizatjon mechanism

Generic reusable patuern E.g. non-blocking list implementatjon using Read-Copy-Update

slide-16
SLIDE 16

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 16

What Is Read-Copy-Update What Is Read-Copy-Update

Non-blocking synchronizatjon mechanism

Targetjng synchronizatjon of read-mostly pointer-based data structures with immutable values

Favorable case: R/W ratjo of ~ 10:1 (but even 1:1 is achievable) Unlimited number of readers without blocking (not waitjng for other readers or writers) Litule overhead on the reader side (smaller than taking an uncontended lock) Readers have to tolerate “stale” data and late updates Readers have to observe “safe” access patuerns Synchronizatjon among writers out of scope of the mechanism Optjonal provisions for asynchronous reclamatjon

slide-17
SLIDE 17

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 17

What Is Read-Copy-Update (2) What Is Read-Copy-Update (2)

Read-side critjcal sectjon

Delimited by r e a d _ l

  • c

k ( ) and r e a d _ u n l

  • c

k ( )

  • peratjons

(non-blocking)

Protected data can be referenced only inside the critjcal sectjon

Safe a c c e s s ( ) methods for reading pointers

Avoiding unsafe compiler optjmizatjons (reloading the pointer) Not necessary for reading values

Quiescent state (a thread outside a critjcal sectjon) Grace period (all threads pass through a quiescent state)

slide-18
SLIDE 18

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 18

What Is Read-Copy-Update (3) What Is Read-Copy-Update (3)

Synchronous write-side update

Atomically unlinking an old element Calling a s y n c h r

  • n

i z e ( )

  • peratjon

Blocks untjl a grace period elapses (all readers pass a quiescent state, no longer referencing the unlinked data) Possibility to reclaim or free the unlinked data

Insertjng a new element using safe a s s i g n ( )

  • peratjon

Avoiding unsafe compiler optjmizatjons and store reordering on weakly ordered architectures

slide-19
SLIDE 19

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 19

Synchronous Update Example Synchronous Update Example

head next v0 next v1 I. Atomic pointer update to remove the element with v0 from the list

slide-20
SLIDE 20

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 20

Synchronous Update Example Synchronous Update Example

head next v0 next v1 I. head next v0 next v1 II. Blocking on s y n c h r

  • n

i z e ( ) During the grace period preexistjng readers can stjll access the “stale” element with v0

slide-21
SLIDE 21

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 21

Synchronous Update Example Synchronous Update Example

head next v0 next v1 I. head next v0 next v1 II. head next v2 next v1 III. No reader can reference the element with v0 anymore – it can be reclaimed New element with v2 can be atomically inserted

slide-22
SLIDE 22

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 22

What Is Read-Copy-Update (4) What Is Read-Copy-Update (4)

Asynchronous write-side update

Using a c a l l ( )

  • peratjon

Non-blocking operatjon registering a callback Callback is executed afuer a grace period elapses

Using a b a r r i e r ( )

  • peratjon

Waitjng for all queued asynchronous callbacks

slide-23
SLIDE 23

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 23

Grace Period Detectjon Grace Period Detectjon

Cornerstone of any RCU algorithm

Implicit trade-ofg between precision and overhead

Any extension of a grace period is also a grace period Long (imprecise) grace periods

Blocking synchronous writers for a longer tjme Increasing memory usage due to unreclaimed elements

Short (precise) grace periods

Increasing overhead on the reader side (need for memory barriers, atomic

  • peratjons, other heavy-weight operatjons, etc.)

Usual compromise

Identjfying naturally occurring quiescent states for the given RCU algorithm

Context switches, exceptjons (tjmer tjcks), etc.

slide-24
SLIDE 24

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 24

The Big Picture ... The Big Picture ...

slide-25
SLIDE 25

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 25

Motjvatjon for RCU in HelenOS Motjvatjon for RCU in HelenOS

Foundatjon for a scalable concurrent data structure Developing a microkernel-specifjc RCU algorithm

Specifjc requirements, constraints and use cases Last well-known RCU implementatjon for a microkernel in 2003 (K42)

slide-26
SLIDE 26

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 26

Credits Credits

AP-RCU

Non-intrusive, portable RCU algorithm Developed and implemented by Andrej Podzimek for UTS (OpenSolaris) [3] [4]

AH-RCU

Inspired by AP-RCU and several other RCU algorithms Developed and implemented by Adam Hraška for SPARTAN (HelenOS) [7] Foundatjon for the Concurrent Hash Table in HelenOS [8] Additjonal variants (preemptjble AP-RCU, user space RCU)

slide-27
SLIDE 27

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 27

HelenOS requirements HelenOS requirements

The RCU algorithm must not impose design concepts of legacy systems on HelenOS

E.g. a specifjc way how the tjmer interrupt handler is implemented

The kernel space RCU algorithm must support

Read-side critjcal sectjons in interrupt and exceptjon handlers Asynchronous reclaimatjon (c a l l ( ) ) in interrupt and exceptjon handlers Read-side critjcal sectjons with preemptjon enabled (not afgectjng scheduling latency)

slide-28
SLIDE 28

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 28

HelenOS requirements (2) HelenOS requirements (2)

Concurrent Hash Table implementatjon

Growing and shrinking Interrupt and non-maskable interrupt tolerant

Suitable for a global page hash table

Concurrent reads with low overhead Concurrent inserts and deletes

slide-29
SLIDE 29

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 29

AH-RCU AH-RCU

Basic characteristjcs

Kernel space algorithm Read-side critjcal sectjons are preemptjble (without loss of performance)

Multjple read-side critjcal sectjons within a tjme slice Expensive operatjons when a thread was preempted do not make much harm

Support for asynchronous reclaimatjon in interrupt and exceptjon handlers No reliance on periodic tjmer

slide-30
SLIDE 30

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 30

AH-RCU (2) AH-RCU (2)

Grace period detectjon

Test if all CPUs passed a quiescent state

Sending an interprocessor interrupt (IPI) to each CPU

If the interrupt handler detects a nestjng count of 0, it issues a memory barrier (representjng a natural quiescent state)

Avoid sending IPI if context switch is detected

Detect any preempted readers holding up the current grace period

Sleep and wait for the last preempted reader holding up the grace period to wake the detector thread

slide-31
SLIDE 31

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 31

AH-RCU (3) AH-RCU (3)

Advantages

Low overhead and preemptjble read-side critjcal sectjon, suitable for exceptjon handlers No regular sampling

Disadvantages

Polling CPUs using interprocessor interrupts might be disruptjve in large systems

slide-32
SLIDE 32

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 32

HelenOS Concurrent Hash Table HelenOS Concurrent Hash Table

Basic characteristjcs

Inspired by Tripletu's relatjvistjc hash table [5] and Michael's lock- free lists [6]

Hash collisions resolved using separate RCU-protected bucket lists Buckets organized as lock-free lists without hazard pointers

RCU stjll protects against accessing invalid pointers and the ABA problem

Concurrent lookups and concurrent modifjcatjons

Tolerance for nested concurrent modifjcatjons from interrupt and exceptjon handlers

Growing and shrinking using background resizing by a factor of 2

Concurrent with lookups and updates Requires four grace periods

Deferred element freeing using RCU c a l l ( )

slide-33
SLIDE 33

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 33

Enough talk!

slide-34
SLIDE 34

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 34

Enough talk! Show me the (pseudo)code!

slide-35
SLIDE 35

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 35

AP-RCU Reader Side AP-RCU Reader Side

r e a d _ l

  • c

k ( ) : d i s a b l e _ p r e e m p t i

  • n

( ) c h e c k _ q s ( ) c p u . n e s t i n g _ c n t + + r e a d _ u n l

  • c

k ( ) : c p u . n e s t i n g _ c n t

  • c

h e c k _ q s ( ) e n a b l e _ p r e e m p t i

  • n

( ) c h e c k _ q s ( ) : i f ( c p u . n e s t i n g _ c n t = = ) { i f ( c p u . l a s t _ s e e n _ g p ! = c u r _ g p ) { g p = c u r _ g p m e m

  • r

y _ b a r r i e r ( ) c p u . l a s t _ s e e n _ g p = g p } }

slide-36
SLIDE 36

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 36

AP-RCU Reader Side AP-RCU Reader Side

r e a d _ l

  • c

k ( ) : d i s a b l e _ p r e e m p t i

  • n

( ) c h e c k _ q s ( ) c p u . n e s t i n g _ c n t + + r e a d _ u n l

  • c

k ( ) : c p u . n e s t i n g _ c n t

  • c

h e c k _ q s ( ) e n a b l e _ p r e e m p t i

  • n

( ) c h e c k _ q s ( ) : i f ( c p u . n e s t i n g _ c n t = = ) { i f ( c p u . l a s t _ s e e n _ g p ! = c u r _ g p ) { g p = c u r _ g p m e m

  • r

y _ b a r r i e r ( ) c p u . l a s t _ s e e n _ g p = g p } }

Note: Writer forces a context switch on CPUs where no read-side critjcal sectjon was not observed for a while. Note: Except m e m

  • r

y _ b a r r i e r ( )

  • nly inexpensive operatjons.

The fjrst reader to notjce the start of a new grace period

  • n each CPU announces its quiescent state.

Once all CPUs announce a quiescent state or perform a context switch (a naturally occurring quiescent state due to disabled preemptjon), the grace period ends.

slide-37
SLIDE 37

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 37

Preemptjble AP-RCU Reader Side Preemptjble AP-RCU Reader Side

r e a d _ l

  • c

k ( ) : d i s a b l e _ p r e e m p t i

  • n

( ) i f ( t h r e a d . n e s t i n g _ c n t = = ) r e c

  • r

d _ q s ( ) t h r e a d . n e s t i n g _ c n t + + e n a b l e _ p r e e m p t i

  • n

( ) r e a d _ u n l

  • c

k ( ) : d i s a b l e _ p r e e m p t i

  • n

( ) i f ( t h r e a d . n e s t i n g _ c n t

  • =

= ) { r e c

  • r

d _ q s ( ) i f ( ( t h r e a d . w a s _ p r e e m p t e d ) | | ( c p u . i s _ d e l a y i n g _ g p ) ) s i g n a l _ u n l

  • c

k ( ) } e n a b l e _ p r e e m p t i

  • n

( ) r e c

  • r

d _ q s ( ) : i f ( c p u . l a s t _ s e e n _ g p ! = c u r _ g p ) { g p = c u r _ g p m e m

  • r

y _ b a r r i e r ( ) c p u _ l a s t _ s e e n _ g p = g p } s i g n a l _ u n l

  • c

k ( ) : i f ( a t

  • m

i c _ e x c h a n g e ( c p u . i s _ d e l a y i n g _ g p , f a l s e ) = = t r u e ) r e m a i n i n g _ r e a d e r s _ s e m a p h

  • r

e . u p ( ) i f ( a t

  • m

i c _ e x c h a n g e ( t h r e a d . w a s _ p r e e m p t e d , f a l s e ) = = t r u e ) { p r e e m p t _ m u t e x . l

  • c

k ( ) p r e e m p t e d _ l i s t . r e m

  • v

e ( t h r e a d ) i f ( ( i s _ e m p t y ( c p u . c u r _ p r e e m p t e d ) ) & & ( p r e e m p t e d _ b l

  • c

k i n g _ g p ) ) r e m a i n i n g _ r e a d e r s _ s e m a p h

  • r

e . u p ( ) p r e e m p t _ m u t e x . u n l

  • c

k ( ) }

slide-38
SLIDE 38

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 38

AH-RCU Reader Side AH-RCU Reader Side

r e a d _ l

  • c

k ( ) : t h r e a d . n e s t i n g _ c n t + + c

  • m

p i l e r _ b a r r i e r ( ) r e a d _ u n l

  • c

k ( ) : c

  • m

p i l e r _ b a r r i e r ( ) t h r e a d . n e s t i n g _ c n t

  • i

f ( t h r e a d . n e s t i n g _ c n t = = w a s _ p r e e m p t e d ) p r e e m p t e d _ u n l

  • c

k ( ) p r e e mp t e d _ u n l

  • c

k ( ) : / / a v

  • i

d r a c e b e t w e e n t h r e a d a n d i n t e r r u p t h a n d l e r i f ( a t

  • m

i c _ e x c h a n g e ( t h r e a d . n e s t i n g _ c n t , ) = = w a s _ p r e e m p t e d ) { p r e e m p t _ l

  • c

k . l

  • c

k ( ) p r e e m p t e d _ l i s t . r e m

  • v

e ( t h r e a d ) i f ( ( i s _ e m p t y ( c p u . c u r _ p r e e m p t e d ) ) & & ( d e t e c t i

  • n

_ w a i t i n g ) ) d e t e c t i

  • n

_ s e m a p h

  • r

e . u p ( ) / / n

  • t

i f y t h e d e t e c t

  • r

t h r e a d a b

  • u

t t h e g r a c e p e r i

  • d

p r e e m p t _ l

  • c

k . u n l

  • c

k ( ) }

Note: Except p r e e m p t e d _ u n l

  • c

k ( )

  • nly inexpensive operatjons.
slide-39
SLIDE 39

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 39

AP-RCU Writer Side AP-RCU Writer Side

s y n c h r

  • n

i z e ( ) : m e m

  • r

y _ b a r r i e r ( ) m u t e x . l

  • c

k ( ) c u r _ g p + + / / s t a r t a n e w g r a c e p e r i

  • d

r e a d e r _ c p u s = [ ] / / g a t h e r C P U s p

  • t

e n t i a l l y i n r e a d

  • s

i d e C S f

  • r

e a c h c p u i n c p u s { i f ( ( ! c p u . i d l e ) & & ( c p u . l a s t _ s e e n _ g p ! = c u r _ g p ) ) { c p u . l a s t _ c t x _ s w i t c h _ c n t = c p u . c t x _ s w i t c h _ c n t r e a d e r _ c p u s + = c p u } } w a i t ( 1 m s ) / / l

  • n

g e s t a c c e p t a b l e g r a c e p e r i

  • d

d u r a t i

  • n

( t u n a b l e ) f

  • r

e a c h c p u i n r e a d e r _ c p u s { / / e n f

  • r

c e a q u i e s c e n t s t a t e i f ( ( ! c p u . i d l e ) & & ( c p u . l a s t _ s e e n _ g p ! = c u r _ g p ) & & ( c p u . l a s t _ c t x _ s w i t c h _ c n t = = c p u . c t x _ s w i t c h _ c n t ) ) c p u . c t x _ s w i t c h _ f

  • r

c e _ w a i t ( ) } m u t e x . u n l

  • c

k ( )

slide-40
SLIDE 40

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 40

AH-RCU Writer Side AH-RCU Writer Side

d e t e c t

  • r

_ t h r e a d : f

  • r

e v e r { w a i t _ f

  • r

_ c a l l b a c k s ( ) / / r u n c a l l b a c k s a d d e d b e f

  • r

e t h e c u r r e n t g r a c e p e r i

  • d

e x e c u t e _ c a l l b a c k s ( ) / / p u s h c a l l b a c k s r e g i s t e r e d s i n c e l a s t p r

  • c

e s s i n g t

  • t

h e q u e u e a d v a n c e _ c a l l b a c k s ( ) w a i t _ f

  • r

_ g p _ e n d ( ) }

slide-41
SLIDE 41

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 41

AH-RCU Writer Side (2) AH-RCU Writer Side (2)

w a i t _ f

  • r

_ g p _ e n d ( ) : g p _ m u t e x . l

  • c

k ( ) i f ( c

  • m

p l e t e d _ g p ! = c u r _ g p ) { / / a g r a c e p e r i

  • d

i s a l r e a d y i n p r

  • g

r e s s w a i t _ f

  • r

_ g p _ e n d _ s i g n a l ( ) g

  • t
  • u

t } e l s e { / / s t a r t a n e w g r a c e p e r i

  • d

p r e e m p t _ l

  • c

k . l

  • c

k ( ) c u r _ g p + + p r e e m p t _ l

  • c

k . u n l

  • c

k ( ) } g p _ m u t e x . u n l

  • c

k ( ) w a i t _ f

  • r

_ r e a d e r s ( ) g p _ m u t e x . l

  • c

k ( ) c

  • m

p l e t e d _ g p = c u r _ g p

  • u

t : g p _ m u t e x . u n l

  • c

k ( )

slide-42
SLIDE 42

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 42

Evaluatjon Evaluatjon

2e+08 4e+08 6e+08 8e+08 1e+09 1.2e+09 1 2 3 4 5 List traversals / second Threads ideal ah-rcu pap-rcu spinlock

Read-side critjcal sectjon scalability: Traversal of a fjve-element list The list is protected as a whole, it is only read, never modifjed.

slide-43
SLIDE 43

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 43

Evaluatjon (2) Evaluatjon (2)

Write-side overhead: Difgerent ratjos of updates vs. lookups Five-element list, four threads running in parallel. Updates are always synchronized by a spinlock.

5e+07 1e+08 1.5e+08 2e+08 2.5e+08 3e+08 3.5e+08 4e+08 4.5e+08 5e+08 10 20 40 60 80 100 Operations / second %

  • f updates

ah-rcu + spinlock pap-rcu + spinlock spinlock

slide-44
SLIDE 44

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 44

Evaluatjon (3) Evaluatjon (3)

Read-side scalability vs. write-side overhead: Crossover point Data points from previous fjgure with low fractjon of updates are discarded.

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 5 10 20 40 60 80 100 Operations / second %

  • f updates

ah-rcu + spinlock pap-rcu + spinlock spinlock

slide-45
SLIDE 45

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 45

Evaluatjon (4) Evaluatjon (4)

Concurrent hash table lookup scalability 128 buckets, average load factor of 4 elements per bucket, 50 % of lookups for hittjng keys, 50 % of lookups for missing keys (each thread used a separate list). The resize conditjon was checked, but never executed.

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 1 2 3 4 5 Lookups / second Threads cht + ah-rcu ht + bucket spinlocks ht + global spinlock

slide-46
SLIDE 46

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 46

Evaluatjon (5) Evaluatjon (5)

Concurrent hash table update overhead: Difgerent ratjos of concurrent updates vs. lookups Four threads running in parallel.

1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 10 20 30 40 50 60 70 80 Operations / second cht + ah-rcu ht + bucket spinlocks ht + global spinlock %

  • f updates
slide-47
SLIDE 47

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 47

Conclusion Conclusion

Novel scalable algorithms

Preemptjble AP-RCU for HelenOS Preemptjble AH-RCU for HelenOS Resizeable Concurrent Hash Table for HelenOS

Suitable as a basic data structure for asynchronous HelenOS IPC Suitable for other kernel uses (e.g. global page table)

Thorough evaluatjon

Promising behavior

slide-48
SLIDE 48

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 48

www.helenos.org

Q&A Q&A

slide-49
SLIDE 49

Martjn Děcký, FOSDEM 2014, February 2nd 2014 Read-Copy-Update for HelenOS 49

References References

[1] Herlihy M. P.: Impossibility and universality results for wait-free synchronizatjon, in Proceedings of 7th Annual ACM Symposium on Principles of Distributed Computjng, ACM, 1988 [2] Kogan A., Petrank E.: Wait-free queues with multjple enqueuers and dequeuers, in Proceedings of 16th ACM Symposium on Principles and Practjce of Parallel Programming, ACM, 2011 [3] Podzimek A., Děcký M., Bulej L., Tůma P.: A Non-Intrusive Read-Copy-Update for UTS, in Proceedings of 18th IEEE Internatjonal Conference on Parallel and Distributed Systems, IEEE, 2012, htup://d3s.mfg.cuni.cz/publicatjons/download/PodzimekDeckyBulejTuma-ICPADS-2012.pdf [4] htup://d3s.mfg.cuni.cz/sofuware/rcu/rcu.patch [5] Tripletu J., McKenney P. E., Walpole J.: Resizable, scalable, concurrent hash tables via relatjvistjc programming, in Proceedings of the 2011 USENIX Annual Technical Conference, ACM, 2011 [6] Michael M. M.: High performance dynamic lock-free hash tables and list-based sets, in Proceedings of 14th Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, 2002 [7] Hraška A.: Read-Copy-Update for HelenOS, master thesis, Charles University in Prague, 2013, htup://www.helenos.org/doc/theses/ah-thesis.pdf [8] htups://code.launchpad.net/~adam-hraska+lp/helenos/cht-bench