Lock-Free Search Data Structures: Throughput Modeling with Poisson - - PowerPoint PPT Presentation

lock free search data structures throughput modeling with
SMART_READER_LITE
LIVE PREVIEW

Lock-Free Search Data Structures: Throughput Modeling with Poisson - - PowerPoint PPT Presentation

Lock-Free Search Data Structures: Throughput Modeling with Poisson Processes Aras Atalar, Paul Renaud-Goud, Philippas Tsigs Chalmers University of Technology qwwe Concurrent Data Structures Pp Pp Concurrency: Concurrency is the


slide-1
SLIDE 1

Lock-Free Search Data Structures: Throughput Modeling with Poisson Processes

Aras Atalar, Paul Renaud-Goud, Philippas Tsigs Chalmers University of Technology qwwe

slide-2
SLIDE 2

Concurrent Data Structures Pp Pp

◮ Concurrency:

∗ Concurrency is the overlapped executions of processes ∗ Interleaving of steps of processes ∗ Synchronization to avoid interleavings that lead to unintended states

◮ Lock-based concurrent data structures:

∗ Rely on mutual exclusion to work in isolation ∗ Limitations: deadlocks, priority inversion and programming flexibility (difficult to compose)

◮ Lock-free concurrent data structures:

∗ Guarantee system-wide progress ∗ Employ optimistic conflict control ∗ Limitations: harder to design and implement

Aras Atalar Throughput of Lock-Free Search Data Structures 2 18

slide-3
SLIDE 3

Related Work Pp Pp

◮ Theoretical results:

◮ Focus on retry loop conflicts and hardware conflicts (exist when

  • perations overlap in time and memory location)

∗ Amortized analyses parameterized with a measure of contention ∗ Model asynchrony with adversarial scheduler ∗ Target worst-case execution times

◮ Empirical results:

∗ Compare the performance of different implementations ∗ Help to grasp the hardware-software interaction

◮ In this work:

∗ Study the throughput performance of lock-free search data structure ∗ Propose analytical tools that provide estimations that is close to what we observe in practice

Aras Atalar Throughput of Lock-Free Search Data Structures 3 18

slide-4
SLIDE 4

Lock-free Search Data Structures Pp Pp

◮ Search data structure is a collection of key, value pairs which are

stored in an organized way to allow efficient search, delete and insert

  • perations (e.g. Hash table, binary tree, skip list, linked list)

◮ Formed of basic blocks (Nodes) ◮ Accessed with Read and Modify (CAS) events ◮ Retry loop conflicts are very improbable (Nodes ≫ Threads) Aras Atalar Throughput of Lock-Free Search Data Structures 4 18

slide-5
SLIDE 5

Algorithm Skeleton Pp Pp

Output of the analysis: Data structure throughput (T ), i.e. number of successful data structure operations per unit of time Procedure AbstractAlgorithm

1 while ! done do 2

key ← SelectKey(keyPMF);

3

  • peration ← SelectOperation(operationPMF);

4

result ← SearchDataStructure(key, operation); ◮ Key ∈ [1, Range] and Operation ∈ {Search, Insert, Delete} ◮ Memoryless and stationary key and operation selection process Aras Atalar Throughput of Lock-Free Search Data Structures 5 18

slide-6
SLIDE 6

Algorithm Skeleton Pp Pp

Output of the analysis: Data structure throughput (T ), i.e. number of successful data structure operations per unit of time Procedure AbstractAlgorithm

1 while ! done do 2

key ← SelectKey(keyPMF);

3

  • peration ← SelectOperation(operationPMF);

4

result ← SearchDataStructure(key, operation); ◮ Key ∈ [1, Range] and Operation ∈ {Search, Insert, Delete} ◮ Memoryless and stationary key and operation selection process ◮ Inputs of the analysis:

◮ Platform parameters: Data and TLB cache hit latencies, CAS

latency, in clock cycles

◮ Algorithm parameters: PMFs for the key and operation selection,

Key range (R), Total number of threads (P), Expected latency of key and operation selection

Aras Atalar Throughput of Lock-Free Search Data Structures 5 18

slide-7
SLIDE 7

Impacting Factors Pp Pp

◮ An operation triggers a number of node accesses (Which nodes?) ◮ Latency of the operation: sum of the latencies of accesses

5 3 1 2 3 4 5 6 7 8 4 2 6 8 7 : Internal Nodes : External Nodes Search (key=3)

Aras Atalar Throughput of Lock-Free Search Data Structures 6 18

slide-8
SLIDE 8

Impacting Factors Pp Pp

◮ Identify the factors that impact the latency of an access:

∗ Capacity misses in data and TLB caches (both in sequential and concurrent executions) ∗ Coherence misses (only in concurrent executions) ∗ Execution time of CAS and stall time due to others’ CAS (only in concurrent executions)

◮ Define access latency of node Ni:

Accessi = tcmp + CASexe

i

+ CASstall

i

+ CASreco

i

+

Hitcacheℓ

i

+

Hittlbℓ

i

(1)

Aras Atalar Throughput of Lock-Free Search Data Structures 7 18

slide-9
SLIDE 9

Impacting Factors Pp Pp

Over a sequence of operations: Coherence Miss

◮ Step 1: P0 reads IntNodekey=3 (brings a valid copy to P0)

5 1 2 3 4 5 6 7 8 4 2 6 8 7 : Internal Nodes : External Nodes Thread 0: Read 3 Thread 0: Search (key=3)

Aras Atalar Throughput of Lock-Free Search Data Structures 8 18

slide-10
SLIDE 10

Impacting Factors Pp Pp

Over a sequence of operations: Coherence Miss

◮ Step 1: P0 reads IntNodekey=3 (brings a valid copy to P0) ◮ Step 2: P1 modifies IntNodekey=3 (invalidates the copy of P0)

5 1 2 5 6 7 8 2 6 8 7 : Internal Nodes : External Nodes Thread 1: Delete (key=4) Thread 1: Modify 3 3

Aras Atalar Throughput of Lock-Free Search Data Structures 8 18

slide-11
SLIDE 11

Impacting Factors Pp Pp

Over a sequence of operations: Coherence Miss

◮ Step 1: P0 reads IntNodekey=3 (brings a valid copy to P0) ◮ Step 2: P1 modifies IntNodekey=3 (invalidates the copy of P0) ◮ Step 3: P0 read IntNodekey=3 (coherence miss of P0)

5 1 2 5 6 7 8 2 6 8 7 : Internal Nodes : External Nodes Thread 0: Search (key=4) Thread 0: Read 3 3

Aras Atalar Throughput of Lock-Free Search Data Structures 8 18

slide-12
SLIDE 12

Approach Pp Pp

Observation: Latency of a node access depends on the interleaving of accesses To estimate the latency of an access on node Ni:

◮ Follow the sequence events (Read and Modify seperately) on Ni by

a thread, when Ni ∈ DS

◮ Slice the execution into consecutive intervals, where an interval

begins with a call to an operation by the thread

◮ Each interval potentially includes a Read event (resp. Modify) at Ni ◮ Think of a static structure: Stationary and memoryless access

pattern Bernoulli Process

Aras Atalar Throughput of Lock-Free Search Data Structures 9 18

slide-13
SLIDE 13

Approach Pp Pp

◮ Poisson Process approximation is well-conditioned if the success

probability is small

◮ Dynamicity: DS change state with insertions and deletions ◮ Bernoulli trials with different success probabilities Poisson

Process (if pj are small)

◮ Key characteristic: set of nodes that are accessed in an operation is

small in front of all nodes

Aras Atalar Throughput of Lock-Free Search Data Structures 10 18

Time Distance to Poisson Process

p=0.1 p=0.2 p=0.8 1

slide-14
SLIDE 14

Statistical Test: Kolmogorov–Smirnov Pp Pp

  • ●● ●
  • ● ●
  • ● ● ●
  • ● ●●●
  • ● ● ●
  • Range: 16384, threads=4, Ins−Del:25−25

0e+00 2e+06 4e+06 6e+06 8e+06 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(a) Read Events for Skiplist

  • ●● ●
  • ● ●
  • ●● ●
  • ● ●
  • ●●
  • ●●
  • Range: 16384, threads=4, Ins−Del:25−25

0e+00 5e+06 1e+07 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(b) Read Events for Hash Table

  • ● ●
  • ●●
  • ●● ●
  • ● ●
  • ● ●
  • ●●
  • ● ●●
  • ● ●
  • ● ●● ●
  • ●●●
  • Range: 16384, threads=4, Ins−Del:25−25

0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(c) Read Events for Binary Tree

  • ● ●●
  • ● ● ●
  • ● ●
  • ● ●

Range: 16384, threads=4, Ins−Del:25−25

500000 1000000 1500000 2000000 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(d) Read Events for Linked List

Aras Atalar Throughput of Lock-Free Search Data Structures 11 18

slide-15
SLIDE 15

Statistical Test: Kolmogorov–Smirnov Pp Pp

  • ●●
  • ● ●
  • ●●
  • ● ● ●

Range: 256, threads=4, Ins−Del:25−25

500000 1000000 1500000 2000000 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(a) CAS Events for Skip list

  • ●●
  • Range: 256, threads=4, Ins−Del:25−25

500000 1000000 1500000 2000000 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(b) CAS Events for Hash Table

  • ●●
  • ● ●
  • ● ●
  • Range: 256, threads=4, Ins−Del:25−25

2500000 5000000 7500000 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(c) CAS Events for Binary Tree

  • ●●
  • ● ●
  • ●●
  • ● ●
  • ● ●
  • ●●●
  • ● ●
  • ● ●●
  • ●●● ● ●●
  • ● ●
  • Range: 256, threads=4, Ins−Del:25−25

0e+00 3e+06 6e+06 9e+06 0.00 0.25 0.50 0.75 1.00

t (Inter−arrival Time) P[X < t] Tracked Keys

  • key0

key1 key2 key3

(d) CAS Events for Linked List

Aras Atalar Throughput of Lock-Free Search Data Structures 11 18

slide-16
SLIDE 16

Poisson Rates Pp Pp

◮ Extract the rate of events on Ni (by a thread) based on a random

  • peration at a random time as a function of throughtput (T ):

∀e ∈ {cas, read} : λe

i = T

P ×

  • ∈{ins,del,src}

R

  • k=1

P [Op = opo

k] × P [opo k e(Ni) | Ni ∈ D] ◮ Throughput per thread: T P ◮ Probability of operation of type o and key k: P [Op = opo k] ◮ Instantiate P [opo k e(Ni) | Ni ∈ D] based on the particularity of

data structure

Aras Atalar Throughput of Lock-Free Search Data Structures 12 18

slide-17
SLIDE 17

Poisson Rates Pp Pp

P [opsrc

k′ e(Nk) | Nk ∈ D] for Skip list:

Search (key=k’) key=-∞ key=k key=k’ key=∞ Node Node Data Routing ht>2

◮ Nj is in the structure if the latest operation on Nj is an insert ◮ Obtain the probability of a node to be in D (Thanks to memoryless

and stationary access pattern)

Aras Atalar Throughput of Lock-Free Search Data Structures 13 18

slide-18
SLIDE 18

Access Latency Pp Pp

◮ Applying expectation to the access latency of Ni:

E [Accessi] = tcmp + E [CASexe

i

] + E

  • CASstall

i

  • + E [CASreco

i

] + E

Hitcacheℓ

i

  • + E

Hittlbℓ

i

  • ◮ Express each term according to the rates at every node λcas

⋆ , λread ⋆ ◮ Useful properties of Poisson Processes: Superposition and Thinning Aras Atalar Throughput of Lock-Free Search Data Structures 14 18

slide-19
SLIDE 19

Access Latency Pp Pp

Estimate the expected access latency E [Accessi] for Ni:

◮ A thread encounters a coherence miss while accessing Ni if the

previous event of the thread on Ni is followed by CAS of another thread: (i) Events from the given thread = λcas

i

+ λread

i

(ii) Superpose (Merge) CAS events from any other thread = λcas

i

(P − 1) P [Coherence Miss on Ni] = λcas

i

(P − 1) λcas

i

P + λread

i Aras Atalar Throughput of Lock-Free Search Data Structures 15 18

slide-20
SLIDE 20

Access Latency vs. Throughput Pp Pp

Link the access latencies and rates with throughput:

◮ Little’s Law states that the expected number of threads accessing a

node is the product of access rate and access latency

◮ Link latencies to throughput using Little’s Law by summing over all

nodes and application latency P =

N

  • i=0

(piλacc

i

E [Accessi]) P:Total number of threads piλacc

i

: Average arrival rate to Ni E [Accessi]: Expected latency to access Ni

Aras Atalar Throughput of Lock-Free Search Data Structures 16 18

slide-21
SLIDE 21

Results Pp Pp

Ins − Del

  • 0 − 0

0.5 − 0.5 5 − 5 10 − 10 15 − 15 25 − 25 40 − 40 50 − 50

  • Range: 65536

4 8 12 16 2e+07 4e+07 6e+07

Number of Threads Throughput (ops/sec)

(a) Skip List

  • Range: 65536

4 8 12 16 1e+08 2e+08 3e+08

Number of Threads Throughput (ops/sec)

(b) Hash Table

  • Range: 65536

4 8 12 16 2.5e+07 5.0e+07 7.5e+07

Number of Threads Throughput (ops/sec)

(c) Binary Tree

  • Range: 65536

4 8 12 16 20000 40000 60000 80000

Number of Threads Throughput (ops/sec)

(d) Linked List

Aras Atalar Throughput of Lock-Free Search Data Structures 17 18

slide-22
SLIDE 22

Conclusion Pp Pp

◮ Analytical tools for throughput of lock-free search data structures ◮ Validate with: hash tables, skiplists, linked lists, binary trees ◮ Could be useful:

∗ Compare different lock-free designs ∗ Facilitates the design decisions ∗ Drive the tuning process (e.g. memory aligment strategies)

Aras Atalar Throughput of Lock-Free Search Data Structures 18 18