Relativistic Red-Black Trees Philip W. Howard, Jonathan Walpole, - - PowerPoint PPT Presentation

relativistic red black trees
SMART_READER_LITE
LIVE PREVIEW

Relativistic Red-Black Trees Philip W. Howard, Jonathan Walpole, - - PowerPoint PPT Presentation

Relativistic Red-Black Trees Philip W. Howard, Jonathan Walpole, October 2013. Presented by Kendall Stewart CS510 Concurrent Systems, Spring 2014 The Story so Far Locking is slow Non-blocking algorithms are complicated Memory


slide-1
SLIDE 1

Relativistic Red-Black Trees

Philip W. Howard, Jonathan Walpole, October 2013.

  • Presented by Kendall Stewart

CS510 Concurrent Systems, Spring 2014

slide-2
SLIDE 2

The Story so Far

  • Locking is slow
  • Non-blocking algorithms are complicated
  • Memory barriers are necessary on most systems
  • RCU solves many issues by providing full read-side

concurrency and a simple API

  • But so far, we’ve just seen it in the context of the

Linux kernel. Does it generalize? How?

slide-3
SLIDE 3

Relativistic Programming

Read-Copy Update Relativistic Programming spin_lock spin_unlock write-lock write-unlock rcu_read_lock rcu_read_unlock start-read end-read rcu_assign_pointer rcu_dereference rp-publish rp-read synchronize_rcu wait-for-readers call_rcu(kfree, …) rp-free

slide-4
SLIDE 4

Relativistic Programming

  • Captures the important idea of RCU: insert delays to

restrict the order of causally dependent events, while letting independent events proceed concurrently.

  • Use publish / subscribe semantics (with memory

barriers and compiler directives) to prevent hardware

  • r software reordering of causally dependent writes
  • Wait for existing readers to preserve causal ordering

within non-atomic operations (e.g. complex changes to a data structure)

  • What’s all this talk of causality about?
slide-5
SLIDE 5

Relativism and Causality

W1 and W2 happen at the same time. R1 observes W1 before W2. R2 observes W2 before W1.

slide-6
SLIDE 6

Relativism and Causality

W1 causes W2. R1 and R2 must both observe W1 before W2.

If we want a consistent ordering, W2 can wait until after W1 occurs:

slide-7
SLIDE 7

Relativism and Causality

  • Causal relationships are easy to reason about


for the same reason that sequential programs are easy to reason about — memory invariance is achieved by an implicit a causal relationship between all program statements.

  • But not all concurrent relationships are causal.

Enforcing causality where it isn’t needed requires unnecessary delays which defeat concurrency.

  • How do we figure out when causality is necessary?
slide-8
SLIDE 8

Atomic Operations

Consider a simple deletion from a (singly) linked list: Takes effect atomically, by using rp-publish:

D = C.next rp-publish(B.next, D) rp-free(C)

No ordering issues; no inconsistent state.

slide-9
SLIDE 9

Complex Operations

What about a more complex operation, like a move? If we could lock down the list, we could do it all in-place. But since we’ve got concurrent readers, we’ll have to make a copy of C to swing in.

slide-10
SLIDE 10

Complex Operations

C’ = copy(C) C’.next = B rp-publish(A.next, C’) rp-publish(B.next, D) rp-free(C)

Simple, right? But wait…

slide-11
SLIDE 11

Complex Operations

What if a reader was at B the whole time? That reader will miss C!

Reader

C’ = copy(C) C’.next = B rp-publish(A.next, C’) wait-for-readers() rp-publish(B.next, D) rp-free(C)

slide-12
SLIDE 12

Complex Operations

  • What happened?
  • The insertion of C’ and the deletion of C were causally related: 


C could not be removed until its copy was in place, because otherwise some readers might have missed the value of C.

  • Therefore, we needed to enforce an ordering by inserting a delay 


(waiting for existing readers).

  • But some readers can still see the value of C twice! Once at C’, and once

at C before the removal takes effect.

  • Whether or not this is okay depends on the semantics of the abstract

data type implemented by the list.

  • If duplicates are okay, we do not need to wait for readers if we are moving

C to a position later in the list — some readers would simply be guaranteed to see the value of C twice.

  • The insertion and deletion are still causally related, but the semantics of

the traversal pattern guarantee an ordering for free.

slide-13
SLIDE 13

Complex Complexity

  • That’s a lot of head-scratching for a really simple

data structure.

  • Is relativistic programming really generalizable?
  • As data structure complexity increases, does

applying RP get harder, or stay about the same?

  • Let’s try applying it to a complicated data structure,

say, a Red-Black Tree.

slide-14
SLIDE 14

Red-Black Trees

  • A self-balancing binary search tree.
  • Most commonly used for implementing a sorted

“map” data structure (i.e., a table of <key, value> pairs, sorted by key).

  • Supports O(log N) inserts, lookups, and deletions.
slide-15
SLIDE 15

Red-Black Trees

  • Invariants:
  • Standard BST (< to the left, >= to the right).
  • Each node has a color (red or black).
  • Both children of a red node are black.
  • Every path from the root to a leaf has the same number 

  • f black nodes.
  • Maintaining these invariants involves performing restructuring
  • perations (rotations) during insertion and deletion.
  • Red-black trees are difficult to parallelize for this reason!
  • Previous attempts have involved global locking (slow as you

might expect) and fine-grained locking (susceptible to deadlock)

  • A good test case for applying RP!
slide-16
SLIDE 16

Relativistic Red-Black Trees

  • Where do we need be careful with ordering?
  • Operations to scrutinize:
  • Read-side:
  • Lookup
  • Traversal (more on this towards the end)
  • Write-side:
  • Insertion
  • Deletion
  • Restructure (occurs during insertion and deletion)
slide-17
SLIDE 17

Lookups

  • Lookups do not require reading the color of a node, or chasing its

parent pointer.

  • The ADT being implemented is a single map, which means that each

key is associated with exactly one value — so readers can stop searching once they find the key they’re looking for.

  • Implications for Readers:
  • They can proceed at full speed with only “start-read” and “end-read”
  • Implications for Updaters:
  • Changes to parent pointers or color will not affect readers.
  • Having temporary duplicate nodes is okay, so long as we ensure

that all potential readers can find at least one of the copies.

slide-18
SLIDE 18

Insertions

  • New nodes are always inserted at a leaf position
  • Readers will either see it or not, depending on the
  • rdering of the updater’s rp-publish and the

reader’s rp-read.

  • No chance to observe an inconsistent state.
  • However, insertion may leave the tree unbalanced,

requiring restructuring!

slide-19
SLIDE 19

Deletions

  • Deleting a leaf is just like insertion — readers either see

the update or don’t (c.f. the linked list removal we saw earlier)

  • Deleting an interior node is more complicated —

therefore we will swap the interior node with its in-order successor (which must be a left-leaf), and then remove the leaf node.

  • A chance for special-case optimization arises if the in-
  • rder successor is the immediate right child of the node

to be removed.

  • Deletion also raises the spectre of restructuring!
slide-20
SLIDE 20

General Internal Delete

To remove B:

  • 1. Identify B’s successor (C) and make a copy (C’).
  • 2. Replace B with C’.
  • 3. Defer collection of B.
  • 4. Remove C and defer its collection.
slide-21
SLIDE 21

General Internal Delete

Oh, right! A reader looking for C might miss it.

  • 1. Identify B’s successor (C) and make a copy (C’).
  • 2. Replace B with C’.
  • 3. Defer collection of B.
  • 4. Wait for existing readers.
  • 5. Remove C and defer its collection.

Reader Reader Reader

slide-22
SLIDE 22

General Internal Delete

  • 2. Replace B with C’
  • 1. Identify and copy successor
  • 3. Defer collection of B.
  • 4. Wait for existing readers.
  • 5. Remove C and defer

its collection.

slide-23
SLIDE 23

Special Case

To remove B, where next node is right child:

  • No copy is necessary, but A is still temporarily duplicated.
  • Why don’t we have to use wait-for-readers()?
  • Same reason as moving a node to a later position in a linked list: traversal ordering.
slide-24
SLIDE 24

Diagonal Restructure

slide-25
SLIDE 25

Zig Restructure

slide-26
SLIDE 26

Read-side (Lookup) Performance

slide-27
SLIDE 27

Single-writer Performance

slide-28
SLIDE 28

Multi-writer Performance

  • Possible synchronization mechanisms:
  • Global locking — same as Linux kernel RCU
  • Fine-grained locking — susceptible to deadlock
  • Non-blocking algorithms — usually complex
  • Software Transactional Memory — to be discussed next week!
  • Used for comparison here:
  • swissTM — STM applied to all operations
  • RP-STM — Relativistic reads, transactional writes
  • ccavl — Non-blocking AVL tree (separate data structure)
  • rp — Relativistic reads, global locking for writes
  • rpavl — AVL tree with relativistic reads, global locking for writes
slide-29
SLIDE 29

Multi-writer Performance

slide-30
SLIDE 30

Linearizability

  • Can be a valuable property for some applications, and makes proofs
  • f correctness easier — but it is not a pre-condition for correctness!
  • These linearizability arguments rely on the fact that temporary

duplicate nodes are okay — so these are properties of relativistic red- black trees implementing a particular ADT, not of relativistic red-black trees in general.

  • Lookups: take effect at the rp-read used to get to the current node
  • Insertions: take effect at the rp-publish used to swing in the new node
  • Deletions: take effect at the rp-publish used to make the node

unreachable — wait-for-readers ensures no inconsistent state is visible

  • Traversals: may not be linearizable!
slide-31
SLIDE 31

Traversals

  • More complex than single lookups, because they require going up and

down the tree

  • Current update algorithms assume that readers don’t chase parent

pointers, and therefore don’t take traversal patterns into account — this means that traversing readers may miss some nodes, or report duplicates

  • Three possible solutions:
  • Treat traversals as atomic — use reader-writer locking instead of

relativistic reads for traversals

  • Conduct an O(N log N) traversal by executing a relativistic 


lookup of every node

  • Write more complex update operations to support relativistic 


O(N) traversals

slide-32
SLIDE 32

Traversals

O(N) algorithm clearly provides faster traversals:

slide-33
SLIDE 33

Traversals

But slower updates:

slide-34
SLIDE 34

Traversals

  • Which method to use depends on the situation:
  • Use reader-writer locking when a linearizable

snapshot is necessary

  • Use O(N log N) algorithm when updates need to

be fast even in the presence of traversals

  • Use O(N) algorithm for maximum traversal speed
slide-35
SLIDE 35

Conclusions

  • That wasn’t much more difficult to reason about than the 


linked list traversal!

  • Read performance is nearly at full unsynchronized speed, and 


scales linearly

  • Relativistic programming looks promising as a generalizable approach
  • Unanswered questions:
  • This is still just a case study — can we prove RP is generalizable?
  • Can a compiler do this automatically?
  • What would it take? What kind of analysis is needed?
  • What kind of hints would the programmer have to give?
slide-36
SLIDE 36

References

  • Jonathan Walpole, “Relativistic Red Black Trees”.

  • Slides. Spring 2013. http://www.cs.pdx.edu/

~walpole/class/cs510/spring2013/slides/14.pptx