Imp mpleme menta*on techniques for libr libraries o aries of tr - - PowerPoint PPT Presentation

imp mpleme menta on techniques for libr libraries o aries
SMART_READER_LITE
LIVE PREVIEW

Imp mpleme menta*on techniques for libr libraries o aries of tr - - PowerPoint PPT Presentation

Imp mpleme menta*on techniques for libr libraries o aries of tr f transac ansac*o *onal c nal conc ncurr urren ent t da data a type types s Liuba Shrira Brandeis University Or Or: T Type-Specific Concurrency Control and STM


slide-1
SLIDE 1

Imp mpleme menta*on techniques for libr libraries o aries of tr f transac ansac*o *onal c nal conc ncurr urren ent t da data a type types s

Liuba Shrira Brandeis University

slide-2
SLIDE 2

Or Or: T Type-Specific Concurrency Control and STM

slide-3
SLIDE 3

Where Modern STMs Fail

3

I’d like a unique ID please 2011 Me too 2012 Non-transactional case

slide-4
SLIDE 4

Where Modern STMs Fail

4

I’d like a unique ID please 2011 Me too transactional case Write conflict! OMG!

slide-5
SLIDE 5

It’s not the STMs problem really

5

Unique ID generator Successive integers Unique IDs Concurrent ops conflict Concurrent ops commute

slide-6
SLIDE 6

Relaxed Atomicity

WTTM 2012 6-juil-17

Early release, open-nested, Eventual, elastic, ²-serializability, etc. Popular in 80s & 90s … DB and distributed Mostly forgotten … Except for snapshot isolation.

slide-7
SLIDE 7

Exploit

Type-Specific Concurrency Control

7

Also from 80s … Commutativity … Non-determinism For example Escrow … Exo-leasing … TM raises different questions

slide-8
SLIDE 8

8

Heart of the Problem

Confusion between thread-level and transaction-level synchronization. Needless entanglement kills concurrency Relaxed consistency models are all about more entanglement

slide-9
SLIDE 9

9

Heart of the Problem

Confusion between thread-level and transaction-level synchronization. Needless entanglement kills concurrency Relaxed consistency models are all about more entanglement

slide-10
SLIDE 10

Short-lived, fine-grained

10

50 Shades of Synchroniza*on

Atomic instruction (CAS) Hardware Transaction Critical Sections Long-lived, coarse-grained Software transaction

2

slide-11
SLIDE 11

Transac*onal Boos*ng

11

Method for transforming….. linearizable highly concurrent Into … highly concurrent black-box

  • bjects

transactional

  • bjects
slide-12
SLIDE 12

12

Concurrent Objects

time q.enq(x) q.enq(y) q.deq(x) q.deq(y) time

slide-13
SLIDE 13

13

Linearizability

time q.enq(x) q.enq(y) q.deq(x) q.deq(y) q.enq(x) q.enq(y) q.deq(x) q.deq(y) time

slide-14
SLIDE 14

14

Linearizable Objects

threads

Thread-level synchronization

Linearizable object

slide-15
SLIDE 15

15

Transac*onal Boos*ng

transactions

Transaction-level synchronization Thread-level synchronization

slide-16
SLIDE 16

16

Disentangled Run-Time

Library: abstract locks, Inverse logs Your favorite fine-grained algorithms HW transactions

slide-17
SLIDE 17

17

Disentangled Reasoning

Commutativity & inverses Linearizability

e.g., rely-guarantee …

slide-18
SLIDE 18

One Implementa*on

transactions Abstract locks Black-box linearizable data object

rem(x)

Undo Logs

add(x)

x

slide-19
SLIDE 19

Lets look at some code

  • Example 1: Transac?onal Set
  • implemented by boos?ng ConcurrentSkipList object, using LockKey for synchroniza?on
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

More examples:

  • Transac?onal Priority Queue, Pipelining, UniqueID …
  • implemented by boos?ng concurrent objects from Java concurrency packages
slide-23
SLIDE 23

Performance of boos*ng

slide-24
SLIDE 24

What’s the Catch?

24

Concurrent calls must commute Different orders yield same state, values (Actually, all about left/right movers) Methods must have inverses Immediately after, restores state

slide-25
SLIDE 25

What’s the Catch?

25

Concurrent calls must commute Different orders yield same state, values (Actually, all about left/right movers) Methods must have inverses Immediately after, restores state

slide-26
SLIDE 26

Boos*ng

  • Reuse code, improve performance
  • But inverses
slide-27
SLIDE 27

And is there ever enough performance?

slide-28
SLIDE 28

How to improve performance?

slide-29
SLIDE 29

Recall, we want

  • Good performance when synchroniza?on is required
  • Scalability
  • E.g., for in-memory key-value store
slide-30
SLIDE 30

Up next: how to improve the performance of a transac*onal data structure?

  • MassTree: a high performance data structure
  • Silo: high performance transac?ons over MassTree using a different

approach

  • STO: a general framework and methodology for building libraries of

customized high performance transac?onal objects

slide-31
SLIDE 31

MassTree

  • High-performance key/value store
  • In shared primary memory
  • Cores run put, get, and delete requests
slide-32
SLIDE 32

Review: Memory Model

  • Each core has a cache
  • Hiang in the cache mabers a lot for reads!
  • What about a write?
  • TSO (Total Store Order)
slide-33
SLIDE 33
  • Thread t1 modifies x and later y
  • Thread t2 sees modifica?on to y
  • t2 reads x
  • Implies t2 sees modifica?on of x

X86-TSO

slide-34
SLIDE 34

MassTree structure

  • Nodes and records
  • Nodes
  • Cover a range of keys
  • Interior and leaf nodes
  • Records
  • Store the values
slide-35
SLIDE 35
slide-36
SLIDE 36

Concurrency Control

  • Reader/writer locks?
slide-37
SLIDE 37

Thread-level Concurrency Control

  • Base instruc?ons
  • Compare and swap
  • On one memory word
  • Fence
slide-38
SLIDE 38

Concurrency Control for mul*-word

  • First word of nodes and records
  • version number (v#) and lock bit
slide-39
SLIDE 39

Concurrency control

  • Write
  • Set lock bit (spin if necessary)
  • uses compare and swap
  • Update node or record
  • Increment v# and release lock
slide-40
SLIDE 40

Concurrency control

  • Write (locking)
  • Read (no locking)
  • Spin if locked
  • Read contents
  • If v# has changed or lock is set, try again
slide-41
SLIDE 41

Concurrency control

  • Writes are pessimis?c
  • Reads are op?mis?c
  • A mix!
  • No writes for reads
slide-42
SLIDE 42

Inser*ng new keys

  • Into leaf node if possible
  • Else split
slide-43
SLIDE 43

Inser*ng new keys

  • Into leaf node if possible
  • Else split
  • Split locks nodes up the path
  • No deadlocks
slide-44
SLIDE 44

Interes*ng Issue with spliYng

slide-45
SLIDE 45

From MassTree to Silo

  • High-performance database
  • With transac?ons
slide-46
SLIDE 46

Silo

  • Database is in primary memory
  • Runs one-shot requests
slide-47
SLIDE 47

Silo

  • Database is in primary memory
  • Runs one-shot requests
  • A tree for each table or index
  • Worker threads run the requests
  • One thread per core
  • Workers share memory
slide-48
SLIDE 48

Transac*ons

begin { % do stuff: run queries % using insert, lookup, update, delete, % and range }

slide-49
SLIDE 49

Running Transac*ons

  • MassTree opera?ons release locks before returning
  • Hold locks longer?
slide-50
SLIDE 50

Running Transac*ons

  • OCC (Op?mis?c Concurrency Control)
  • Thread maintains read-set and write-set
  • Read-set contains version numbers
  • Write-set contains new state
  • At end, abempts commit
slide-51
SLIDE 51

Commit Protocol

  • Phase 1: lock all objects in write-set
  • Bounded spinning
slide-52
SLIDE 52

Commit Protocol

  • Phase 1: lock all objects in write-set
  • Phase 2: verify v#’s of read-set
  • Abort if locked or changed
slide-53
SLIDE 53

Commit Protocol

  • Phase 1: lock all objects in write-set
  • Phase 2: verify v#’s of read-set
  • Select Tid (>v# of r- and w-sets)
  • Without a write to shared state!
slide-54
SLIDE 54

Commit Protocol

  • Phase 1: lock all objects in write-set
  • Phase 2: verify v#’s of read-set
  • Select Tid (>v# of r- and w-sets)
  • Phase 3: update objects in write-set
  • Using Tid as v#
slide-55
SLIDE 55

Commit Protocol

  • Phase 1: lock all objects in write-set
  • Phase 2: verify v#’s of read-set
  • Select Tid (>v# of r- and w-sets)
  • Phase 3: update objects in write-set
  • Release locks
slide-56
SLIDE 56

Addi*onal Issues

  • Range queries
  • Absent keys
  • Garbage collec?on
slide-57
SLIDE 57

Performance

slide-58
SLIDE 58
slide-59
SLIDE 59

Performance

  • vs. Hstore
  • M. Stonebraker et al, The end of an architectural era: (it’s ?me for a complete rewrite), VLDB

‘07

slide-60
SLIDE 60
slide-61
SLIDE 61

Silo to STO

  • STO (Sosware Transac?onal Objects)
slide-62
SLIDE 62

STO

  • Silo trees are an highly concurrent data structures
  • Specifica?on determines poten?al concurrency
  • Implementa?on is hidden
  • Including concurrency control
slide-63
SLIDE 63

A vision for concurrent code

  • Apps run transac?ons
slide-64
SLIDE 64

A vision for concurrent applica*on code, like boos*ng

  • Apps run transac?ons
  • Using transac?on-aware datatypes
  • E.g., sets, maps, arrays, boxes, queues
slide-65
SLIDE 65

Transac*ons

begin { % do stuff: run queries % using insert, lookup, update, delete, % and range }

slide-66
SLIDE 66

Back to our vision for concurrent code

  • Apps run transac?ons
  • Using fast transac?on-aware datatypes
  • Designed by experts
  • Require sophis?ca?on to implement
  • But so are concurrent datatypes in Java
slide-67
SLIDE 67

STO

  • Think Silo broken into two parts:
  • STO platorm
  • Transac?on-aware datatypes
slide-68
SLIDE 68

STO PlaZorm

  • Runs transac?ons
  • Transac?on { … }
  • Provides transac?on state
  • Read- and write-sets
  • Runs commit protocol using callbacks
slide-69
SLIDE 69

Transac*on-aware datatypes

  • Provide ops for user code
  • E.g., lookup, update, insert, delete, range
  • Record reads and writes via platorm
  • Provide callbacks
  • lock, unlock, check, install, cleanup
slide-70
SLIDE 70

Transac*on-aware datatypes

  • Provide ops for user code
  • Record reads and writes via platorm
  • Provide callbacks
  • lock, unlock, check, install, cleanup
  • cleanup for abort, aser-commit
  • E.g., dele?ng a key
slide-71
SLIDE 71

Transac*on-aware types

  • Maps
  • Hash tables
  • Counters
  • void incr( ) vs. int incr( )
  • Uses check and install
slide-72
SLIDE 72

Designing fast STO’s data types:

  • Specifica?on
  • Some common tricks
  • Inserted elements: direct updates
  • Absent elements: extra version numbers
  • Read-my-writes: adjustments
  • Correctness
slide-73
SLIDE 73

Specifica*on

slide-74
SLIDE 74

Inserted elements and repeated lookup

  • Hybrid strategy
  • T1: insert “poisoned” element
  • T2: abort on observing a “poisoned” element
  • T1: no need to validate inser?on at commit
slide-75
SLIDE 75

Absent elements

  • T1: get(K) : K is absent
  • How to validate at commit?
  • Extra version numbers
  • For hash table: on bucket of absent key
  • BTree : on parent node of absent key
slide-76
SLIDE 76

Read-my-writes

  • T1: scan a range A..Z; insert a key C
  • how to validate range ?
slide-77
SLIDE 77

Correctness

  • Version numbers on all shared state
  • Exclusive locks
  • Check must fail if segment locked or version number changed
  • Modifica?ons invisible to other transac?ons before install
slide-78
SLIDE 78

Performance

slide-79
SLIDE 79

Implementa*on

  • Silo: 7000 lines of code
  • STO-Silo: 3000 lines of code
  • Uses hash tables and trees
slide-80
SLIDE 80
slide-81
SLIDE 81

Performance

  • vs. TL2 (grey)
  • And boos?ng (lilac)
slide-82
SLIDE 82

Op*mism vs Pessimism?

Effects of pessimism and boos?ng on a hash table micro benchmark. Numbers are speedup at 16 threads rela?ve to single-threaded STO

slide-83
SLIDE 83

More examples of powerful op*miza*ons

slide-84
SLIDE 84

STO: last word for exploi*ng ADT in TM?

  • Needs more work
  • More datatypes
  • Methodology
  • Programming language integra?on
  • Distribu?on
slide-85
SLIDE 85

Summary: Implemen*ng a Library of Transac*onal Data types:

  • Dis?nc?on between short Thread level vs coarse grain Transac?on-level

coordina?on is key

  • Can re-use data structure code or co-design and customize:
  • Boos?ng: a black box approach, first ADT/STM (code re-use, restric?ons)
  • STO: high-performance pessimis?c/op?mis?c approach (co-design and customize)
  • (Thanks to M.Herlihy and B. Liskov for help with slides!)
slide-86
SLIDE 86

Ques*ons