Design Principles for Scaling Multi-core OLTP Under High Contention - - PowerPoint PPT Presentation

design principles for scaling multi core oltp under high
SMART_READER_LITE
LIVE PREVIEW

Design Principles for Scaling Multi-core OLTP Under High Contention - - PowerPoint PPT Presentation

Design Principles for Scaling Multi-core OLTP Under High Contention Kun Ren, Jose Faleiro , Daniel Abadi Yale University Conflicts: The scourge of database systems Logical conflicts Due to data conflicts between transactions T 1 :


slide-1
SLIDE 1

Design Principles for Scaling Multi-core OLTP Under High Contention

Kun Ren, Jose Faleiro, Daniel Abadi Yale University

slide-2
SLIDE 2

Conflicts: The scourge of database systems

  • Logical conflicts
  • Due to data conflicts between transactions
  • Physical conflicts
  • Due to contention on internal data-structures

T2: Write(x); T1: Read(x);0

slide-3
SLIDE 3
  • Logical conflicts
  • Due to data conflicts between transactions
  • Physical conflicts
  • Due to contention on internal data-structures

Addressed via new correctness criteria, exploiting semantics Addressed via new protocols, DB architectures

Conflicts: The scourge of database systems

T2: Write(x2); T1: Read(x0);

slide-4
SLIDE 4

… but conflicts are inevitable

  • Logical conflicts are application dependent
  • Logical conflicts directly result in physical conflicts
slide-5
SLIDE 5

… but conflicts are inevitable

  • Logical conflicts are application dependent
  • Logical conflicts directly result in physical conflicts

We address these physical conflicts in multi-core main-memory DBs

slide-6
SLIDE 6

The life of a transaction

Thread/process pool

slide-7
SLIDE 7

The life of a transaction

T

Thread/process pool

slide-8
SLIDE 8

The life of a transaction

  • Assign a transaction to an

“execution context”

  • Assigned context performs all

actions required to execute the transaction

  • Concurrency control
  • Transaction logic
  • Logging
  • Deal with conflicts via shared

concurrency control meta-data Thread/process pool

T

slide-9
SLIDE 9

The life of a transaction

  • Assign a transaction to an

“execution context”

  • Assigned context performs all

actions required to execute the transaction

  • Concurrency control
  • Transaction logic
  • Logging
  • Deal with conflicts via shared

concurrency control meta-data Thread/process pool

T

slide-10
SLIDE 10

The life of a transaction

  • Assign a transaction to an

“execution context”

  • Assigned context performs all

actions required to execute the transaction

  • Concurrency control
  • Transaction logic
  • Logging
  • Deal with conflicts via shared

concurrency control meta-data Thread/process pool

T

slide-11
SLIDE 11

Example: Logical lock acquisition

T2

A B C

T1

slide-12
SLIDE 12

Example: Logical lock acquisition

T2

  • Latch bucket

A B C

T1

slide-13
SLIDE 13

Example: Logical lock acquisition

  • Latch bucket
  • Add lock request

T2 T2

A B C

T1

slide-14
SLIDE 14

Example: Logical lock acquisition

A B C

T1

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T2

slide-15
SLIDE 15

Example: Logical lock acquisition

A B C

T1

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T3 T4 T5

slide-16
SLIDE 16

Example: Logical lock acquisition

A B C

T1

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T3 T4 T5 Several threads must acquire a single latch Synchronization overhead Overhead increases with contention

slide-17
SLIDE 17

Example: Logical lock acquisition

A B C

T1

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T3 T4 T5 Lock list moves across cores Coherence overhead

slide-18
SLIDE 18

Example: Logical lock acquisition

B C

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T3 T4 T5 Lock list moves across cores Coherence overhead

A T1

slide-19
SLIDE 19

Example: Logical lock acquisition

B C

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T4 T5 Lock list moves across cores Coherence overhead

A T1 T2

T3

slide-20
SLIDE 20

Example: Logical lock acquisition

B C

  • Latch bucket
  • Add lock request
  • Unlatch bucket

Lock list moves across cores Coherence overhead T2 T5 T3 T4

A T1 T2 T3

slide-21
SLIDE 21

Example: Logical lock acquisition

B C

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 Lock list moves across cores Coherence overhead T3 T5 T4

A T1 T2 T3 T4

slide-22
SLIDE 22

Example: Logical lock acquisition

A B C

T1

  • Latch bucket
  • Add lock request
  • Unlatch bucket

T2 T3 T4 T5 More synchronization overhead

slide-23
SLIDE 23

The result?

Throughput Number of Threads

slide-24
SLIDE 24

Dealing with contention on few cores

slide-25
SLIDE 25

Dealing with contention on lots of cores

slide-26
SLIDE 26

Observations

  • Contention for lock list depends on workload, not

implementation

  • Latches can be made as fine-grained as possible
  • E.g., bucket-level latches
  • But if records are popular, fine-grained latching will not

help

slide-27
SLIDE 27

Every protocol has the same overheads

  • Concurrency control protocols use object meta-data
  • Lock lists in locking
  • Timestamps in timestamp ordering, MVCC, OCC
  • Object meta-data is accessible by any thread
  • E.g., threads update read and write timestamps in timestamp ordering
  • E.g., threads manipulate lock lists in 2PL
  • Globally updatable shared meta-data is the problem
  • Synchronization, coherence overheads
  • No bound on threads contending for the same meta-data
slide-28
SLIDE 28

Every protocol has the same overheads

  • Concurrency control protocols use object meta-data
  • Lock lists in locking
  • Timestamps in timestamp ordering, MVCC, OCC
  • Object meta-data is accessible by any thread
  • E.g., threads update read and write timestamps in timestamp ordering
  • E.g., threads manipulate lock lists in 2PL
  • Globally updatable shared meta-data is the problem
  • Synchronization, coherence overheads
  • No bound on threads contending for the same meta-data

Scalability anti-pattern

slide-29
SLIDE 29

Need a mechanism to bound contention

  • n shared meta-data
slide-30
SLIDE 30

Decouple concurrency control and execution

  • Delegate concurrency control to a specific set of threads
  • These threads are responsible for performing only

concurrency control logic

  • Access to concurrency control meta-data is mediated via

concurrency control threads

slide-31
SLIDE 31

Communication via message-passing

  • No data sharing between concurrency control and

execution threads

  • Concurrency control and execution threads interact via

explicit message-passing

  • Like RPC in distributed systems
slide-32
SLIDE 32

Example: Logical lock acquisition

T2

B C A

T1 CCA CCB CCC

slide-33
SLIDE 33

Example: Logical lock acquisition

T2

B C A

T1 CCA CCB CCC

Enqueue lock request

T2

slide-34
SLIDE 34

Example: Logical lock acquisition

T2

B C A

T1 CCA CCB CCC

Add to lock list

T2

slide-35
SLIDE 35

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5

slide-36
SLIDE 36

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5 One consumer & producer per queue Bounded contention per queue

slide-37
SLIDE 37

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5 One consumer & producer per queue Bounded contention per queue

slide-38
SLIDE 38

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5

T2 T3 T4 T5

One core manipulates lock list List cannot “bounce” around cores List likely remains cached under high contention

slide-39
SLIDE 39

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5

T3 T4 T5

T2 One core manipulates lock list List cannot “bounce” around cores List likely remains cached under high contention

slide-40
SLIDE 40

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5

T4 T5

T2 T3 One core manipulates lock list List cannot “bounce” around cores List likely remains cached under high contention

slide-41
SLIDE 41

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5

T5

T2 T3 T4 One core manipulates lock list List cannot “bounce” around cores List likely remains cached under high contention

slide-42
SLIDE 42

Example: Logical lock acquisition

A

T1 CCA

  • Enqueue lock request
  • Acquire lock

T2 T3 T4 T5 T2 One core manipulates lock list List cannot “bounce” around cores List likely remains cached under high contention T3 T4 T5

slide-43
SLIDE 43

TPC-C NewOrder and Payment

  • 16 Warehouses
  • 80 core machine
slide-44
SLIDE 44

TPC-C NewOrder and Payment

0.0 M 0.5 M 1.0 M 1.5 M 2.0 M 2.5 M 3.0 M 10 20 40 60 80 Throughput (txns/sec) Number of CPU cores

Delegated Conventional

slide-45
SLIDE 45

Observations

  • Could be adapted to any concurrency control protocol
  • Indeed, to any multi-core DB sub-system
  • Key idea: Delegate functionality to threads
  • E.g., concurrency control v.s. execution
  • Message-passing for communication
  • Message-passing may be inevitable on heterogeneous

hardware

slide-46
SLIDE 46

Examples of delegating functionality

  • Delegating functionality has been successfully used in a

variety of domains

  • Multi-core indexing -- Physiological partitioning (PLP), PALM
  • Distributed OCC validation – Hyder, Centiman
  • Multi-core MVCC – Bohm, Lazy transactions
slide-47
SLIDE 47

Conclusions

  • DB implementations cannot circumvent workload conflicts
  • Workload conflicts result in data-structure contention
  • Transaction to thread assignment causes unbounded data-

structure contention

  • Delegate functionality to threads to bound contention
slide-48
SLIDE 48

If your DB is in this position…

slide-49
SLIDE 49