Parallel Execution for Conflicting Transactions Neha Narula Thesis - - PowerPoint PPT Presentation

parallel execution for conflicting transactions
SMART_READER_LITE
LIVE PREVIEW

Parallel Execution for Conflicting Transactions Neha Narula Thesis - - PowerPoint PPT Presentation

Parallel Execution for Conflicting Transactions Neha Narula Thesis Advisors: Robert Morris and Eddie Kohler 1 Database-backed applications require good performance WhatsApp: 1M messages/sec Facebook: 1/5 of all page views in the US


slide-1
SLIDE 1

Parallel Execution for Conflicting Transactions

Neha Narula

1 ¡

Thesis Advisors: Robert Morris and Eddie Kohler

slide-2
SLIDE 2

Database-backed applications require good performance

WhatsApp:

  • 1M messages/sec

Facebook:

  • 1/5 of all page views in the US

Twitter:

  • Millions of messages/sec

from mobile devices

slide-3
SLIDE 3

Databases are difficult to scale

3 ¡

Database is stateful Application servers are stateless; add more for more traffic

slide-4
SLIDE 4

Scale up using multi-core databases

4 ¡

Context

  • Many cores
  • In-memory database
  • OLTP workload
  • Transactions are stored

procedures No stalls due to users, disk,

  • r network
slide-5
SLIDE 5

Goal

5 ¡

Execute transactions in parallel

10 20 30 40 50 60 70 80 Throughput cores

slide-6
SLIDE 6

Challenge

Conflicting data access

6 ¡

10 20 30 40 50 60 70 80 Throughput cores

Conflict: two transactions access the same data and

  • ne is a write
slide-7
SLIDE 7

TXN1(k, j Key) (Value, Value) { a := GET(k) b := GET(j) return a, b }

Database transactions should be serializable

7 ¡

TXN2(k, j Key) { ADD(k,1) ADD(j,1) } TXN1 TXN2 TXN2 TXN1

time

  • r

To the programmer: Valid return values for TX1: (0,0) k=0,j=0

  • r (1,1)
slide-8
SLIDE 8

Transactions are incorrectly seeing intermediate values GET(k)GET(j)

Executing in parallel could produce incorrect interleavings

8 ¡

ADD(k,1) ADD(j,1)

time

TX1 returns (1,0)

k=0,j=0

slide-9
SLIDE 9

Concurrency control enforces serial execution

ADD(x,1) ADD(x,1) ADD(x,1)

9 ¡

time

Transactions on the same records execute one at a time

slide-10
SLIDE 10

Concurrency control enforces serial execution

core 0 core 1 core 2

ADD(x,1) ADD(x,1) ADD(x,1)

10 ¡

time

Serial execution results in a lack of scalability

slide-11
SLIDE 11

Idea #1: Split representation for parallel execution

core 0 core 1 core 2

ADD(x,1) ADD(x,1)

11 ¡

time

  • Transactions on the same record can proceed in

parallel on per-core values

  • Reconcile per-core values for a correct value

x0:1 x1:1 x2:1 ADD(x,1) per-core values for record x x is split across cores x0:0 x1:0 x2:0 ADD(x,1) ADD(x,1) ADD(x,1) ADD(x,1) ADD(x,1) x0:3 x1:3 x2:2

x = 8

slide-12
SLIDE 12

Other types of operations do not work with split data

core 0 core 1 core 2

12 ¡

time

  • Executing with split data does not work for all types of
  • perations
  • In a workload with many reads, better to not use per-

core values

x0:3 x1:3 x2:2 ADD(x,1) PUT(x,42) GET(x) x1:4 x2:42

x = ??

slide-13
SLIDE 13

core 0 core 1 core 2

ADD(x,1) ADD(x,1) ADD(x,1)

13 ¡

time GET(x) ADD(x,1) GET(x) ADD(x,1) GET(x) ADD(x,1)

reconcile

Can execute in parallel Can execute in parallel

  • Key Insight: Reordering transactions reduces

– Cost of reconciling – Cost of conflict

  • Serializable execution

Idea #2: Reorder transactions

slide-14
SLIDE 14

Idea #3: Phase reconciliation

core 0 core 1 core 2

14 ¡

time

  • Database automatically detects contention to split

a record between cores

  • Database cycles through phases: split and joined
  • Doppel: An in-memory key/value database

reconcile

Split Phase Joined Phase Split Phase

split

Conventional concurrency control

slide-15
SLIDE 15

Challenges

Combining split data with general database Combining split data with general database workloads: workloads:

  • 1. How to handle transactions with multiple keys and

different operations?

  • 2. Which operations can use split data correctly?
  • 3. How to dynamically adjust to changing workloads?

15 ¡

slide-16
SLIDE 16

Contributions

  • Synchronized phases to support any

transaction and reduce reconciliation

  • verhead
  • Identifying a class of splittable operations
  • Detecting contention to dynamically split

data

16 ¡

slide-17
SLIDE 17

Outline

  • Challenge 1: Phases
  • Challenge 2: Operations
  • Challenge 3: Detecting contention
  • Performance evaluation
  • Related work and discussion

17 ¡

slide-18
SLIDE 18

Split phase

  • The split phase executes operations on

contended records on per-core slices (x0, x1, x2)

18 ¡

core 0 core 1 core 2 ADD(x0,1) ADD(x1,1) ADD(x2,1)

split phase

slide-19
SLIDE 19

Reordering by stashing transactions

  • Split records have selected operations for a given split

phase

  • Cannot correctly process a read of x in the current state
  • Stash transaction to execute after reconciliation

19 ¡

core 0 core 1 core 2

split phase

ADD(x1,1) GET(x) ADD(x0,1) ADD(x1,1) ADD(x2,1)

slide-20
SLIDE 20

20 ¡

core 0 core 1 core 2

split phase

  • All cores hear they should reconcile their per-core state
  • Stop processing per-core writes

ADD(x1,1) ADD(x0,1) ADD(x1,1) ADD(x2,1) GET(x)

slide-21
SLIDE 21
  • Reconcile state to global store
  • Wait until all cores have finished reconciliation
  • Resume stashed read transactions in joined phase

21 ¡

core 0 core 1 core 2

reconciliation

x = x + x0 x = x + x1 x = x + x2 GET(x)

joined phase

slide-22
SLIDE 22

22 ¡

core 0 core 1 core 2 x = x + x0 x = x + x1 x = x + x2

reconciliation

  • Reconcile state to global store
  • Wait until all cores have finished reconciliation
  • Resume stashed read transactions in joined phase

GET(x)

joined phase

slide-23
SLIDE 23

Transitioning between phases

23 ¡

core 0 core 1 core 2

  • Process stashed transactions in joined phase using

conventional concurrency control

  • Joined phase is short; quickly move on to next split

phase

GET(x)

split phase

GET(x)

joined phase

ADD(x1,1) ADD(x2,1)

slide-24
SLIDE 24

Challenge #1

How to handle transactions with multiple keys and different operations?

  • Split and non-split data
  • Different operations on a split record
  • Multiple split records

24 ¡

slide-25
SLIDE 25

Transactions on split and non-split data

  • Transactions can operate on split and non-split records
  • Rest of the records (y) use concurrency control
  • Ensures serializability for the non-split parts of the

transaction

25 ¡

core 0 core 1 core 2 ADD(x0,1) ADD(x1,1) PUT(y,2) ADD(x3,1) PUT(y,2)

split phase

slide-26
SLIDE 26

Transactions with different

  • perations on a split record
  • A transaction which executes different
  • perations on a split record is also stashed,

even if one is a selected operation

26 ¡

core 0 core 1 core 2 ADD(x0,1) ADD(x1,1) PUT(y,2) ADD(x3,1) PUT(y,2)

split phase

ADD(x,1)GET(x)

slide-27
SLIDE 27

All records use concurrency control in joined phase

27 ¡

core 0 core 1 core 2 ADD(x0,1) ADD(x1,1) PUT(y,2) ADD(x3,1) PUT(y,2)

split phase

  • In joined phase, no split data, no split operations
  • ADD also uses concurrency control

ADD(x,1)GET(x)

joined phase

ADD(x,1)GET(x)

slide-28
SLIDE 28

Transactions with multiple split records

28 ¡

core 0 core 1 core 2

split phase

  • x and y are split and operations on them use per-core

slices (x0, x1, x2) and (y0, y1, y2)

  • Split records all use the same synchronized phases

ADD(x2,1)MULT(y2,2) ADD(x0,1) ADD(x1,1) MULT(y2,1)

slide-29
SLIDE 29

Reconciliation must be synchronized

  • Cores reconcile all of their split records: ADD for x and

MULT for y

  • Parallelize reconciliation
  • Guaranteed to read values atomically in next joined phase

29 ¡

core 0 core 1 core 2

reconciliation

x = x + x1 x = x + x2 y = y * y0 y = y * y1 y = y * y2 x = x + x0

joined phase

GET(x)GET(y)

slide-30
SLIDE 30

Delay to reduce overhead of reconciliation

30 ¡

core 0 core 1 core 2 ADD(x0,1) GET(x)

  • Wait to accumulate stashed transactions, many

in joined phase

  • Reads would have conflicted; now they do not

ADD(x1,1) ADD(x2,1) ADD(z,1) GET(x) GET(x) GET(x) GET(x)

split phase joined phase

GET(x) ADD(x2,1) ADD(x1,1) ADD(x0,1)

slide-31
SLIDE 31

When does Doppel switch phases?

31 ¡

(ns > 0 && ts > 10ms) || ns > 100,000 Split phase Joined phase

Completed stashed txns

ns = # stashed ts = time in split phase

slide-32
SLIDE 32

Outline

  • Challenge 1: Phases
  • Challenge 2: Operations
  • Challenge 3: Detecting contention
  • Performance evaluation
  • Related work and discussion

32 ¡

slide-33
SLIDE 33

Challenge #2

Define a class of operations that is correct and performs well with split data.

33 ¡

slide-34
SLIDE 34

Operations in Doppel

34 ¡

Developers write transactions as stored procedures which are composed of

  • perations on database keys and values

void ADD(k,n) void MAX(k,n) void MULT(k,n)

Operations on numeric values which modify the existing value

slide-35
SLIDE 35

Why can ADD(x,1) execute correctly

  • n split data in parallel?
  • Does not return a value
  • Commutative

35 ¡

ADD(k,n) { v[k] = v[k] + n }

slide-36
SLIDE 36

Commutativity

Two operations commute if executed on the database s in either order, they produce the same state s’ and the same return values.

36 ¡

  • p

s s’ =

  • p
slide-37
SLIDE 37

Hypothetical design: commutativity is sufficient

core 0 core 1 core 2

T1

37 ¡

  • Not-split operations in transactions execute
  • Split operations are logged
  • They have no return values and are on different data,

so cannot affect transaction execution

  • 1

T5

  • 5

T2

  • 2

T4

  • 4

T3 T6

  • 3
  • 6

log: log: log:

  • 1
  • 5
  • 3
  • 6
  • 2
  • 4
slide-38
SLIDE 38

Hypothetical design: apply logged

  • perations later

core 0 core 1 core 2

T1

38 ¡

  • Logged operations are applied to database state

in a different order than their containing transactions

T5 T2 T4 T3 T6 log: log: log:

  • 3
  • 6
  • 2
  • 4
  • 1
  • 5
slide-39
SLIDE 39

Correct because split operations can be applied in any order

39 ¡

  • 1
  • 3
  • 6
  • 5
  • 2
  • 4
  • 1
  • 3
  • 6
  • 5
  • 2
  • 4

After applying the split operations in any order, same database state

s s’

T1 T2 T3 T4 T5 T6

=

slide-40
SLIDE 40

Is commutativity enough?

For correctness, yes. For performance, no. Which operations can be summarized?

40 ¡

slide-41
SLIDE 41

Summarized operations

An set of operations can be summarized if for all sequences of operations in the set, there is a function f that produces the same result and runs in time order a single operation.

41 ¡

  • 1 o2 o3

s s’ =

f

slide-42
SLIDE 42

core 1

MAX(x,27)

x1:27

MAX(x,10)

x1:10

core 0

MAX(x,55) MAX(x,2)

x0:55

core 2

MAX(x,21)

x2:21

MAX can be summarized

42 ¡

  • Each core keeps one piece of state
  • 55 is an abbreviation of a function to apply later
  • O(#cores) time to reconcile x

x = MAX(x,55) (55) x = MAX(x,27) (55) x = MAX(x,21) (55)

slide-43
SLIDE 43

SHA1 cannot be summarized

SHA1(k) { v[k] = sha1(v[k]) }

43 ¡

SHA1(SHA1(x)) = SHA1(SHA1(x))

SHA1(x) commutes!

slide-44
SLIDE 44

?

core 0

SHA1(x) SHA1(x) SHA1(x)

SHA1 is commutative but we do not know how to summarize it

44 ¡

  • Need to produce a function that produces the

same value as SHA1 run n times on x, but has running time O(SHA1)

  • No such function
slide-45
SLIDE 45

Operation summary

Properties of operations that Doppel can split:

– Always commute – Can be summarized – Single key – Have no return value

Runtime restriction:

– Only one type of operation per record per split phase

45 ¡

slide-46
SLIDE 46

Ordered PUT and insert to an ordered list

Example commutative and summarizable operations

46 ¡

void ADD(k,n) void MAX(k,n) void MULT(k,n) void OPUT(k,v,o) void TOPK_INSERT(k,v,o)

Operations on numeric values which modify the existing value With timestamps, last writer wins Short indexes, top friends or follower lists

slide-47
SLIDE 47

Outline

  • Challenge 1: Phases
  • Challenge 2: Operations
  • Challenge 3: Detecting contention
  • Performance evaluation
  • Related work and discussion

47 ¡

slide-48
SLIDE 48

Challenge #3

Dynamically adjust to changes in the workload:

  • Which records are contended?
  • What operations are happening on different

records?

48 ¡

slide-49
SLIDE 49

How to determine what to split?

  • Developer annotates records

– Difficult to determine – Popular data changes over time

  • Automatically split data based on observed

contention

– Count records and operations which cause conflict – Split records actually causing serialization – Sample for low cost

49 ¡

slide-50
SLIDE 50

Which records does Doppel split?

50 ¡

x is not split x is split during split phases

impact(x,op) < tj impact(x,op) > tc impact(x,op) = conflictsop(x)

  • ther(x)
slide-51
SLIDE 51

Implementation

  • Doppel implemented as a multithreaded Go

server; one worker thread per core

  • Coordinator thread manages phase

changes

  • Transactions are procedures written in Go
  • All data fits in memory; key/value interface

with optionally typed values

  • Doppel uses optimistic concurrency control

51 ¡

slide-52
SLIDE 52

Outline

  • Challenge 1: Phases
  • Challenge 2: Operations
  • Challenge 3: Detecting contention
  • Performance evaluation
  • Related work and discussion

52 ¡

slide-53
SLIDE 53

Performance evaluation

  • Extreme contention
  • A range of contention
  • Changing workloads
  • Workloads with a mix of reads and writes
  • A complex application

53 ¡

slide-54
SLIDE 54

Experimental setup

  • All experiments run on an 80 core Intel

server running 64 bit Linux 3.12 with 256GB of RAM

  • All data fits in memory; don’t measure RPC
  • r disk
  • All graphs measure throughput in

transactions/sec

54 ¡

slide-55
SLIDE 55

How much does Doppel improve throughput on contentious write-

  • nly workloads?

55 ¡

slide-56
SLIDE 56

Doppel executes conflicting workloads in parallel

Throughput (millions txns/sec)

20 cores, 1M 16 byte keys, transaction: ADD(x,1) all on same key 5 10 15 20 25 30 35 Doppel OCC 2PL

56 ¡

slide-57
SLIDE 57

Contentious workloads scale well

1M 16 byte keys, transaction: ADD(x,1) all writing same key

57 ¡

0M 10M 20M 30M 40M 50M 60M 70M 80M 90M 100M 10 20 30 40 50 60 70 80 Throughput (txns/sec) number of cores Doppel OCC 2PL Synchronization of phase changing

slide-58
SLIDE 58

How much contention is required for Doppel’s techniques to help?

58 ¡

slide-59
SLIDE 59

Doppel outperforms 2PL and OCC even with low contention

0M 5M 10M 15M 20M 25M 30M 35M 20 40 60 80 100 Throughput (txns/sec) % of transactions with hot key Doppel OCC 2PL

59 ¡

20 cores, 1M 16 byte keys, transaction: ADD(x,1) on different keys

5% of writes to contended key

slide-60
SLIDE 60

Can Doppel detect and respond to changing workloads over time?

60 ¡

slide-61
SLIDE 61

Doppel adapts to changing popular data

0M 5M 10M 15M 20M 25M 10 20 30 40 50 60 70 80 90 Throughput (txns/sec) time (seconds) Doppel OCC

61 ¡

20 cores, 1M 16 byte keys, transaction: ADD(x,1) 10% on same key

slide-62
SLIDE 62

How much benefit can Doppel get with many stashed transactions?

62 ¡

slide-63
SLIDE 63

Read/Write benchmark

  • Users liking pages on a social network
  • 2 tables: users, pages
  • Two transactions:

– ADD 1 to a page’s like count, PUT user like of page – GET a page’s like count, GET user’s last like

  • 1M users, 1M pages, Zipfian distribution of

page popularity Doppel splits the popular page counts But those counts are also read most often

63 ¡

slide-64
SLIDE 64

Benefits even when there are reads and writes to the same popular keys

64 ¡

1 2 3 4 5 6 7 8 9 Doppel OCC

Throughput (millions txns/sec)

20 cores, transactions: 50% read, 50% write

slide-65
SLIDE 65

Doppel outperforms OCC for a wide range of read/write mixes

20 cores, transactions: RW benchmark

65 ¡

0M 2M 4M 6M 8M 10M 12M 14M 16M 18M 20 40 60 80 100 Throughput (txns/sec) % of transactions that read Doppel OCC

Doppel does not split any data and performs the same as OCC

M

  • r

e s t a s h e d r e a d t r a n s a c t i

  • n

s

slide-66
SLIDE 66

Does Doppel improve throughput for a realistic application: RUBiS?

66 ¡

slide-67
SLIDE 67

RUBiS

  • Auction benchmark modeled after eBay

– Users bid on auctions, comment, list new items, search

  • 1M users and 33K auctions
  • 7 tables, 17 transactions
  • 85% read only transactions (RUBiS bidding mix)
  • Two workloads:

– Roughly uniform Roughly uniform distribution of bids – Skewed Skewed distribution of bids; a few auctions are very popular

67 ¡

slide-68
SLIDE 68

RUBiS StoreBid transaction

StoreBidTxn StoreBidTxn(bidder bidder, , amount amount, , item item) { ) { ADD ADD(NumBidsKey NumBidsKey(item item),1) ) MAX MAX(MaxBidKey MaxBidKey(item item), ), amount amount) ) OPUT OPUT(MaxBidderKey MaxBidderKey(item) item), , bidder bidder, , amount amount) ) PUT PUT(NewBidKey NewBidKey(), Bid{ (), Bid{bidder bidder, , amount amount, , item item}) }) } }

The contended data is only operated on by splittable operations. Inserting new bids is not likely to conflict

68 ¡

slide-69
SLIDE 69

2 4 6 8 10 12 Uniform Skewed Doppel OCC

Doppel improves throughput for the RUBiS benchmark

69 ¡

Throughput (millions txns/sec)

80 cores, 1M users 33K auctions, RUBiS bidding mix. 50% bids on top auction

Caused by StoreBid transactions (8%) 3.2x throughput improvement

slide-70
SLIDE 70

Outline

  • Challenge 1: Phases
  • Challenge 2: Operations
  • Challenge 3: Detecting contention
  • Performance evaluation
  • Related work and discussion

70 ¡

slide-71
SLIDE 71

Related work

  • Shared memory DBs

– Silo, Hekaton, ShoreMT

  • Partitioned DBs

– DORA, PLP , Hstore

  • Choosing partitions

– Schism, Estore, Horticulture

  • Transactional memory

– Scheduling [Kim 2010, Attiya 2012]

71 ¡

Doppel runs conflicting transactions in parallel

slide-72
SLIDE 72

Related work

  • Commutativity

– Abstract Datatypes [Weihl 1988] – CRDTs [Shapiro 2011] – RedBlue consistency [Li 2012] – Walter [Sovran 2011]

  • Scalable operating systems

– Clustered objects in Tornado [Parsons 1995] – OpLog [Boyd-Wickizier 2013] – Scalable commutativity rule [Clements 2013]

72 ¡

Doppel combines these ideas in a transactional database

slide-73
SLIDE 73

Future Work

  • Generalizing to distributed transactions
  • More data representations
  • Larger class of operations which commute
  • Durability and recovery

73 ¡

slide-74
SLIDE 74

Conclusion

Multi-core phase reconciliation:

  • Achieves parallel performance when transactions

conflict by combining split data and concurrency control

  • Performs well on uniform workloads while improving

performance significantly on skewed workloads.

74 ¡

slide-75
SLIDE 75

Thanks

Robert, Eddie, and Barbara Co-authors and colleagues PDOS and former PMG Academic and industry communities Family and friends

75 ¡

Brian Allen, Neelam Narula, Arun Narula, Megan Narula, Adrienne Winans, Austin Clements, Yandong Mao, Adam Marcus, Alex Pesterev, Alex Yip, Max Krohn, Cody Cutler, Frank Wang, Xi Wang, Ramesh Chandra, Emily Stark, Priya Gupta, James Cowling, Dan Ports, Irene Zhang, Jean Yang, Grace Woo, Szymon Jakubczak, Omar Khan, Sharon Perl, Brad Chen, Ben Swanson, Ted Benson, Eugene Wu, Evan Jones, Vijay Pandurangan, Keith Winstein, Jonathan Perry, Stephen Tu, Vijay Boyapati, Ines Sombra, Tom Santero, Chris Meiklejohn, John Wards, Gergely Hodicska, Zeeshan Lakhani, Bryan Kate, Michael Kester, Aaron Elmore, Grant Schoenebeck, Matei Zaharia, Sam Madden, Mike Stonebraker, Frans Kaashoek, Nickolai Zeldovich

slide-76
SLIDE 76

Phase length and read latency

5000 10000 15000 20000 25000 30000 35000 10 20 30 40 50 60 70 80 90 100 Average Read Latency (µs) phase length (ms) Uniform Skewed Skewed Write Heavy

76 ¡