A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue - PowerPoint PPT Presentation

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems Software Research Group Virginia Tech, USA

Motivation ● Effjcient concurrent FIFO queues are hard – Elimination techniques and relaxed FIFO queues are typically specialized ● Desirable properties – Scalability : leveraging many cores effjciently – Portability : using standard atomic primitives (e.g., single-width CAS) – Memory Effjciency : high memory utilization, avoiding reallocation due to livelocks

Existing Approaches ● Classical Michael & Scott’s (M&S) FIFO queue: not very scalable [ PODC’96 ] ● Various “lockless” ring bufgers (circular queues). They are typically either not lock-free or linearizable, or both ● Lock-free ring bufgers. They are not that scalable [ Tsigas et al: SPAA’01, Feldman et al.: SIGAPP’15 ] ● LCRQ: a M&S list of scalable (but livelock-prone) ring bufgers. Requires double-width CAS [ Morrison et al: PPoPP’13 ] ● WFQUEUE: a wait-free design, the fast-path-slow-path methodology workarounds livelocks. More complex API and per-thread state [ Yang et al: PPoPP’16 ]

FAA vs. CAS ● FAA (fetch-and-add) generally scales better than CAS (compare-and-set) – Can be leveraged for ring bufgers (LCRQ, WFQUEUE) Xeon E7-8880 v3 2.3 GHz, 4x18 cores

Proposed Data Structure ● T wo queues – aq and fq store indices – A data array contains elements – Single-width CAS is suffjcient!

Infjnite Array Queue (livelock-prone) ● The original design described for LCRQ int Tail = 0, Head = 0; void *dequeue() { while ( true ) { void enqueue( void *p) { H = FAA(&Head, 1); while ( true ) { p = SWAP(&Array[H], T); ⊥* T = FAA(&Tail, 1); if (p ≠ ) return p; ⊥* if (SWAP(&Array[T], p) = ) if (Load(Head) ≤ H + 1) break ; return nullptr ; } } } }

Infjnite Array Queue (livelock-free) ● We use our data structure and introduce a “threshold” size_t dequeue() { if (Load(&Threshold) < 0) int Tail = 0, Head = 0; return <empty>; signed int Threshold = -1; while ( true ) { H = FAA(&Head, 1); void enqueue( size_t idx) { idx = SWAP(&Ent[H], T); while ( true ) { ⊥* if (idx != ) return idx; T = FAA(&Tail, 1); if (FAA(&Threshold, -1) ≤ 0) ⊥* if (SWAP(&Ent[T], idx) = ) { return <empty>; Store(&Threshold, 2n-1); if (Load(Head) ≤ H + 1) break ; return <empty> ; } } } } }

Threshold Bound ● Consider two cases – The last dequeuer is ahead of the last enqueuer (the threshold value does not matter) – The last dequeuer is not ahead of the last enqueuer Number of threads ≤ n

Scalable Circular Queue (SCQ) ● We double the capacity of the queue and set the threshold value to (3n-1) ● Some other difgerences (e.g., cycle management) with LCRQ ● (Unbounded) LSCQ: more memory effjcient than LCRQ ● A specialized version of SCQ for double-width CAS

Evaluation: Memory Usage Xeon E7-8880 v3 2.3 GHz, 4x18 cores

Evaluation: 50% Enq, 50% Deq Xeon E7-8880 v3 2.3 GHz, POWER8 3.0 GHz, 4x18 cores 8x8 cores

Evaluation: Pairwise Enq-Deq Xeon E7-8880 v3 2.3 GHz, POWER8 3.0 GHz, 4x18 cores 8x8 cores

More details ● Code is open-source and available at: – https://github.com/rusnikola/lfqueue Thank you!

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue - PowerPoint PPT Presentation

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems Software Research Group Virginia Tech, USA Motivation Effjcient concurrent FIFO queues are hard Elimination techniques and relaxed FIFO queues are

Title How FIFO is Your Concurrent FIFO Queue? Andreas Haas , Christoph M. Kirsch, Michael

PC PORTABLE PC PORTABLE PC PORTABLE Introducing the PC Portable Lamp, one of a range of

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

How FIFO is Your Concurrent FIFO Queue? Authors: Andreas Haas, Christoph M. Kirsch, Michael

Modeling FIFO Communication Channels Using SystemVerilog Interfaces FIFO Channel Master Slave

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Concurrency Problems Thierry Sans (recap) Lock A lock is an object in memory providing two atomic

Portable fuel cell system s Jaeyoung Lee September 19, 2006 http:/ / w w w .h2 fc.re.kr Energy

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the

A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for Shared Memory Multiprocessor

Efficient and Reliable Lock-Free Memory Introduction The Problem Reclamation

Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis [POPL'19]

Emerging Technology Community of Interest February 19, 2019 Advancing Government through

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

The Relational Algebra and Relational Calculus 5DV119 Introduction to Database Management

Ashequl Qadir University of Wolverhampton, UK ashequl.qadir@wlv.ac.uk Outline Introduction

Parallel Algorithms and CS26 S260 Algor gorit ithmic mic Engin ginee eerin ing Yihan

ACLU of Oregon Legislative Advocacy Bill of Rights Action Network Webinar March 15, 2013 1

Java classes object-oriented programming (OOP) : Writing programs that perform most of their

Making Change in 2048 David Eppstein 9th International Conference on Fun With Algorithms (FUN

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue - PowerPoint PPT Presentation

A Scalable, Portable, and Memory-Effjcient Lock-Free FIFO Queue Ruslan Nikolaev Systems Software Research Group Virginia Tech, USA Motivation Effjcient concurrent FIFO queues are hard Elimination techniques and relaxed FIFO queues are

Title How FIFO is Your Concurrent FIFO Queue? Andreas Haas , Christoph M. Kirsch, Michael

PC PORTABLE PC PORTABLE PC PORTABLE Introducing the PC Portable Lamp, one of a range of

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

How FIFO is Your Concurrent FIFO Queue? Authors: Andreas Haas, Christoph M. Kirsch, Michael

Modeling FIFO Communication Channels Using SystemVerilog Interfaces FIFO Channel Master Slave

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Concurrency Problems Thierry Sans (recap) Lock A lock is an object in memory providing two atomic

Portable fuel cell system s Jaeyoung Lee September 19, 2006 http:/ / w w w .h2 fc.re.kr Energy

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the

A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for Shared Memory Multiprocessor

Efficient and Reliable Lock-Free Memory Introduction The Problem Reclamation

Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis [POPL'19]

Emerging Technology Community of Interest February 19, 2019 Advancing Government through

Cross Validation &amp; Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

The Relational Algebra and Relational Calculus 5DV119 Introduction to Database Management

Ashequl Qadir University of Wolverhampton, UK ashequl.qadir@wlv.ac.uk Outline Introduction

Parallel Algorithms and CS26 S260 Algor gorit ithmic mic Engin ginee eerin ing Yihan

ACLU of Oregon Legislative Advocacy Bill of Rights Action Network Webinar March 15, 2013 1

Java classes object-oriented programming (OOP) : Writing programs that perform most of their

Making Change in 2048 David Eppstein 9th International Conference on Fun With Algorithms (FUN

Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer