A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for - - PowerPoint PPT Presentation

a simple fast and scalable non blocking concurrent fifo
SMART_READER_LITE
LIVE PREVIEW

A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for - - PowerPoint PPT Presentation

Chalmers University of Technology A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for Shared Memory Multiprocessor Systems Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology 1


slide-1
SLIDE 1

Chalmers University of Technology

1

A Simple, Fast and Scalable Non-Blocking Concurrent FIFO Queue for Shared Memory Multiprocessor Systems

Philippas Tsigas Yi Zhang Department of Computing Science Chalmers University of Technology

slide-2
SLIDE 2

Chalmers University of Technology

2

Talk Outline

  • Synchronization in shared memory

multiprocessors

  • Non-blocking Queue

– ABA Problem – Performance issues

  • Conclusions
slide-3
SLIDE 3

Chalmers University of Technology

3

Mutual Exclusion

  • Traditional way for synchronization.
  • Performance degradation under high

contention.

– Network contention – Lock convoy

  • Complex (pessimistic) scheduling

analysis

– Priority inversion – Deadlock

slide-4
SLIDE 4

Chalmers University of Technology

4

slide-5
SLIDE 5

Chalmers University of Technology

5

Non-blocking Synchronization

  • An alternative approach for

synchronization

  • Lock-free and Wait-free
  • Better performance in multiprocessor

systems

– No lock convoy – No priority inversion – No deadlock

slide-6
SLIDE 6

Chalmers University of Technology

6

Non-blocking Queue

Previous Results

  • Designed for Asynchronous Shared

Memory Multiprocessors

  • Previous work

– Lamport (1983) – … … – Michael and Scott (1998) – … …

slide-7
SLIDE 7

Chalmers University of Technology

7

Non-blocking Queue

Our Results

  • The new non-blocking queue
  • utperforms the best known

alternative implementation.

+ the new solution to the ABA problem together with + the lazy pointer updating to improve performance + the algorithmic design of the queue as a cyclic array

slide-8
SLIDE 8

Chalmers University of Technology

8

ABA or Pointer Recycling Problem

  • Occurs when read-modify-write is used

in lock-free computing with CAS atomic primitive

  • Drawback of the CAS atomic primitive
  • A Lot of overhead introduced to solve

it

slide-9
SLIDE 9

Chalmers University of Technology

9

The Specification of CAS

Boolean CAS(int *mem, register old, new) { temp = *mem; if (temp == old) { *mem = new; return (TRUE);} else return FALSE; }

slide-10
SLIDE 10

Chalmers University of Technology

10

ABA Problem (Example)

  • Array-based Queue (Enqueue)

1. Loop 2. head = Queue.head 3. ... ... 4. if CAS(Queue.array[head],NULL,data) 5. ... ...

  • 6. End loop
slide-11
SLIDE 11

Chalmers University of Technology

11

ABA Problem (Example)

  • Array-based Queue (Dequeue)

1. Loop 2. tail = Queue.tail 3. ... ... 4. if CAS(Queue.array[tail],data,NULL) 5. ... ...

  • 6. End loop
slide-12
SLIDE 12

Chalmers University of Technology

12

ABA Problem (Example)

Execution History (Empty Queue)

  • P1

Enqueue 2 (Preempted) Enqueue 4

  • P2

Enqueue ... ... Dequeue

slide-13
SLIDE 13

Chalmers University of Technology

13

ABA Problem

Traditional Solution

  • Using version number

– Splits word into two parts – Uses one part of the word as a version number – Increases the version number of the word whenever updating the word

  • Not a complete solution, ABA can still

happen, when the version number runs

  • ut of space.
slide-14
SLIDE 14

Chalmers University of Technology

14

ABA Problem

Traditional Solution (Drawbacks)

  • The actual pointer length is smaller

than the system pointer length

– Programmers must manage the pointer (memory) themselves – Limits the memory that can be accessed

  • Tag operations introduce extra
  • verhead
slide-15
SLIDE 15

Chalmers University of Technology

15

ABA Problem

Our Solution

  • Introduce a ghost copy of each value and

turn ABA to ABA’B´A

  • For example, NULL means empty in the

Queue implementation

  • Using NULL(0) and NULL(1) mean empty

cell

  • Recycle the NULL values
  • More NULL values can be introduced.
slide-16
SLIDE 16

Chalmers University of Technology

16

Queue using Cyclical Array

slide-17
SLIDE 17

Chalmers University of Technology

17

ABA Problem

Execution History (Empty Queue)

  • P1

Enqueue 2 (Preempted) Enqueue 4

  • P2

Enqueue at pos A ... ... Dequeue at pos A ... ... Enqueue at pos A ... ... Dequeue at pos A

slide-18
SLIDE 18

Chalmers University of Technology

18

Performance Issues of Synchronization

  • Network contention

– Access to shared memory – Spinning on shared memory – Cache coherent protocols

  • Lock convoys
slide-19
SLIDE 19

Chalmers University of Technology

19

Mutual Exclusion Based Solutions for Performance Issues

  • Avoiding network contention

– Ticket lock – MCS Queue lock

  • Avoiding lock-convoy effect

– Scheduler-conscious synchronization

slide-20
SLIDE 20

Chalmers University of Technology

20

Non-blocking and Performance

  • Avoiding the performance problems
  • f lock-convoy from the beginning
  • No much consideration about network

contention

slide-21
SLIDE 21

Chalmers University of Technology

21

Observations on Queue

  • perations
  • CAS operations are network operations;

they generate a lot of traffic

  • CAS operations are used when changing the

head and tail of the queue

  • Head and tail of the queue do not have to

always point to the actual head and tail of the queue in a non-blocking implementation (Does it?)

slide-22
SLIDE 22

Chalmers University of Technology

22

Our Approach: Sketch

  • Let the head and tail pointers lag

behind the actual head and tail

  • Use computation to calculate the

actual head and tail

  • Trade-off between the computation

time for finding the actual head/tail and the synchronization time for keeping head/tail lag not to lag much behind.

slide-23
SLIDE 23

Chalmers University of Technology

23

Results on SUN Enterprise 10000 with Full Contention

slide-24
SLIDE 24

Chalmers University of Technology

24

Results on SUN Enterprise 10000 with Full Contention

slide-25
SLIDE 25

Chalmers University of Technology

25

Results on SGI Origin 2000 with Full Contention

slide-26
SLIDE 26

Chalmers University of Technology

26

Conclusion

  • A new non-blocking concurrent FIFO queue

queue algorithm is present.

  • A simple mechanism for easing ABA

problem is proposed.

  • A mechanism to lower the contention of
  • f non-blocking operations is introduced

ed ew algorithm perform very well under UMA MA and ccNUMA machines. s.