lock free search data structures throughput modeling with
play

Lock-Free Search Data Structures: Throughput Modeling with Poisson - PowerPoint PPT Presentation

Lock-Free Search Data Structures: Throughput Modeling with Poisson Processes Aras Atalar, Paul Renaud-Goud, Philippas Tsigs Chalmers University of Technology qwwe Concurrent Data Structures Pp Pp Concurrency: Concurrency is the


  1. Lock-Free Search Data Structures: Throughput Modeling with Poisson Processes Aras Atalar, Paul Renaud-Goud, Philippas Tsigs Chalmers University of Technology qwwe

  2. Concurrent Data Structures Pp Pp ◮ Concurrency: ∗ Concurrency is the overlapped executions of processes ∗ Interleaving of steps of processes ∗ Synchronization to avoid interleavings that lead to unintended states ◮ Lock-based concurrent data structures: ∗ Rely on mutual exclusion to work in isolation ∗ Limitations: deadlocks, priority inversion and programming flexibility (difficult to compose) ◮ Lock-free concurrent data structures: ∗ Guarantee system-wide progress ∗ Employ optimistic conflict control ∗ Limitations: harder to design and implement Throughput of Lock-Free Search Data Structures 2 18 Aras Atalar

  3. Related Work Pp Pp ◮ Theoretical results: ◮ Focus on retry loop conflicts and hardware conflicts (exist when operations overlap in time and memory location) ∗ Amortized analyses parameterized with a measure of contention ∗ Model asynchrony with adversarial scheduler ∗ Target worst-case execution times ◮ Empirical results: ∗ Compare the performance of different implementations ∗ Help to grasp the hardware-software interaction ◮ In this work: ∗ Study the throughput performance of lock-free search data structure ∗ Propose analytical tools that provide estimations that is close to what we observe in practice Throughput of Lock-Free Search Data Structures 3 18 Aras Atalar

  4. Lock-free Search Data Structures Pp Pp ◮ Search data structure is a collection of � key , value � pairs which are stored in an organized way to allow efficient search, delete and insert operations ( e.g. Hash table, binary tree, skip list, linked list) ◮ Formed of basic blocks (Nodes) ◮ Accessed with Read and Modify (CAS) events ◮ Retry loop conflicts are very improbable ( Nodes ≫ Threads ) Throughput of Lock-Free Search Data Structures 4 18 Aras Atalar

  5. Algorithm Skeleton Pp Pp Output of the analysis: Data structure throughput ( T ), i.e. number of successful data structure operations per unit of time Procedure AbstractAlgorithm 1 while ! done do key ← SelectKey(keyPMF); 2 operation ← SelectOperation(operationPMF); 3 result ← SearchDataStructure(key , operation); 4 ◮ Key ∈ [1 , Range ] and Operation ∈ { Search , Insert , Delete } ◮ Memoryless and stationary key and operation selection process Throughput of Lock-Free Search Data Structures 5 18 Aras Atalar

  6. Algorithm Skeleton Pp Pp Output of the analysis: Data structure throughput ( T ), i.e. number of successful data structure operations per unit of time Procedure AbstractAlgorithm 1 while ! done do key ← SelectKey(keyPMF); 2 operation ← SelectOperation(operationPMF); 3 result ← SearchDataStructure(key , operation); 4 ◮ Key ∈ [1 , Range ] and Operation ∈ { Search , Insert , Delete } ◮ Memoryless and stationary key and operation selection process ◮ Inputs of the analysis: ◮ Platform parameters : Data and TLB cache hit latencies, CAS latency, in clock cycles ◮ Algorithm parameters : PMF s for the key and operation selection, Key range ( R ), Total number of threads ( P ), Expected latency of key and operation selection Throughput of Lock-Free Search Data Structures 5 18 Aras Atalar

  7. Impacting Factors Pp Pp ◮ An operation triggers a number of node accesses (Which nodes?) ◮ Latency of the operation: sum of the latencies of accesses : Internal Nodes Search (key=3) 5 : External Nodes 3 7 2 4 6 8 1 2 3 4 5 6 7 8 Throughput of Lock-Free Search Data Structures 6 18 Aras Atalar

  8. Impacting Factors Pp Pp ◮ Identify the factors that impact the latency of an access: ∗ Capacity misses in data and TLB caches (both in sequential and concurrent executions) ∗ Coherence misses (only in concurrent executions) ∗ Execution time of CAS and stall time due to others’ CAS (only in concurrent executions) ◮ Define access latency of node N i : Access i = t cmp + CAS exe + CAS stall + CAS reco � Hit cache ℓ � Hit tlb ℓ + + i i i i i ℓ ℓ (1) Throughput of Lock-Free Search Data Structures 7 18 Aras Atalar

  9. Impacting Factors Pp Pp Over a sequence of operations: Coherence Miss ◮ Step 1: P 0 reads IntNode key =3 (brings a valid copy to P 0 ) Thread 0: Search (key=3) : Internal Nodes 5 : External Nodes Thread 0: Read 3 7 2 4 6 8 1 2 3 4 5 6 7 8 Throughput of Lock-Free Search Data Structures 8 18 Aras Atalar

  10. Impacting Factors Pp Pp Over a sequence of operations: Coherence Miss ◮ Step 1: P 0 reads IntNode key =3 (brings a valid copy to P 0 ) ◮ Step 2: P 1 modifies IntNode key =3 (invalidates the copy of P 0 ) Thread 1: Delete (key=4) : Internal Nodes 5 : External Nodes Thread 1: Modify 3 7 2 3 6 8 1 2 5 6 7 8 Throughput of Lock-Free Search Data Structures 8 18 Aras Atalar

  11. Impacting Factors Pp Pp Over a sequence of operations: Coherence Miss ◮ Step 1: P 0 reads IntNode key =3 (brings a valid copy to P 0 ) ◮ Step 2: P 1 modifies IntNode key =3 (invalidates the copy of P 0 ) ◮ Step 3: P 0 read IntNode key =3 (coherence miss of P 0 ) Thread 0: Search (key=4) : Internal Nodes 5 : External Nodes Thread 0: Read 3 7 2 3 6 8 1 2 5 6 7 8 Throughput of Lock-Free Search Data Structures 8 18 Aras Atalar

  12. Approach Pp Pp Observation: Latency of a node access depends on the interleaving of accesses To estimate the latency of an access on node N i : ◮ Follow the sequence events ( Read and Modify seperately) on N i by a thread , when N i ∈ DS ◮ Slice the execution into consecutive intervals, where an interval begins with a call to an operation by the thread ◮ Each interval potentially includes a Read event (resp. Modify) at N i ◮ Think of a static structure: Stationary and memoryless access pattern � Bernoulli Process Throughput of Lock-Free Search Data Structures 9 18 Aras Atalar

  13. Pp Approach Pp ◮ Poisson Process approximation is well-conditioned if the success probability is small ◮ Dynamicity: DS change state with insertions and deletions ◮ Bernoulli trials with different success probabilities � Poisson Process (if p j are small) ◮ Key characteristic: set of nodes that are accessed in an operation is small in front of all nodes Distance to Poisson Process p=0.8 1 p=0.2 0 p=0.1 0 Time Throughput of Lock-Free Search Data Structures 10 18 Aras Atalar

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend