Allocating memory in a lock-free manner Anders Gidenstam, Marina - PowerPoint PPT Presentation

Allocating memory in a lock-free manner Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers University of Technology

Outline � Introduction � Lock-free synchronization � Memory allocators � NBmalloc � Architecture � Data structures � Experiments � Conclusions 2005 2 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Synchronization on a shared object � Lock-free and wait-free synchronization � Concurrent operations without enforcing mutual exclusion � Avoids: • blocking and priority inversion � Lock-free • At least one operation always makes progress � Wait-free • All operations finish in a bounded number of their own steps � Synchronization primitives � Built into CPU and memory system • Atomic read-modify-write (i.e. a critical section of one instruction) � Examples • Test-and-set, Compare-and-Swap, Load-Linked / Store-Conditional 2005 3 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Synchronization on a shared object � Desired semantics of a shared data object � Linearizability [Herlihy & Wing, 1990] • For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. O 1 O 2 O 3 O 1 O 2 O 3 2005 4 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Memory management and lock-free synchronization � Concurrent memory management � Concurrent applications • Memory is a shared resource • Concurrent memory requests • Potential problems: contention, blocking, etc � Why lock-free? • Scalability/fault-tolerance potential • Prevents a delayed thread from blocking other threads • Scheduler decisions • Page faults etc • Many non-blocking algorithms uses dynamic memory allocation • => non-blocking memory allocator needed 2005 5 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Memory Allocators � Provide dynamic memory to the application � Allocate / Deallocate interface � Maintains a pool of memory (a.k.a. heap) � Online problem – requests are handled in order � Performance � Fragmentation � Runtime overhead Memory address 2005 6 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Concurrent Memory Allocators � Goals � Scalability � Avoiding • False-sharing • Threads use data in the same cache-line • Heap blowup • Memory freed on one CPU is not made available to the others • Fragmentation • Runtime overhead CPUs Cache line 2005 7 Anders Gidenstam, Distributed Computing and Systems, Chalmers

The Hoard architecture [Berger et al, 2000] Superblocks Per-processor heaps Contains blocks of one size class � Threads running on different CPUs allocate � Pros: Easy to transfer and reuse � from different places memory, prevents heap blowup Avoids false-sharing and limits contention � Cons: External fragmentation � Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap Processor heap size-classes SB header SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB SB Fixed set of size classes/allocatable sizes Handled separately � Pros: Simple � Cons: Increases internal fragmentation � 2005 8 Anders Gidenstam, Distributed Computing and Systems, Chalmers

The lock-free challenges The superblock internal freelist 1. Lock-free stack (a.k.a. IBM freelist [IBM, 1983] ) � Moving and finding superblocks within a per- 2. processor heap Returning superblocks to the global heap for reuse 3. New lock-free data structure: The flat-set. � • Find an item in a set • Move an item between sets atomically 2005 9 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Lock-free flat-sets Remove Insert Lock-free container data structure L-F Set L-F Set � Properties Unless “Remove + Insert” appears atomic an item may get stuck in “limbo”. � Items can be moved from one Flat-set set to another atomically Current � An item can only be in one “set” at a time Superblock � Operations SB header � Insert � Get_any � Insert atomically removes the item from its old location 2005 10 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Moving a shared pointer � Goal: � Move a pointer value between two shared pointer locations � Requirements � The pointer target must stay accessible � The same # of shared pointers to the target after the move as before � Lock-free behaviour � Issues � One atomic CAS is not enough! We’ll need several steps. � Interfering threads need to help unfinished operations 2005 11 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Moving a shared pointer From From To New_pos New_pos New_pos From To To Old_pos Old_pos Old_pos From - - Note that some extra details are needed to prevent ABA problems. 2005 12 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Experimental results � Benchmark applications � Larson • Scalability • False-sharing � Active-false/Passive-false • Active false-sharing • Passive false-sharing 2005 13 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Experimental results Speed-up Memory usage Larson benchmark. Sun 4xUltraSPARC III 2005 14 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Experimental results Speed-up Memory usage Larson benchmark. SGI Origin 3800 32(/128)xMIPS 2005 15 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Conclusions � Lock-free memory allocator � Scalable � Behaves well on both UMA and NUMA architectures � Lock-free flat-sets � New lock-free data structure � Allows lock-free inter-object operations � Implementation � Freely available (GPL) 2005 16 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Future Work � Further development of the memory allocator � Reclaiming superblocks for reuse in a different size class � Improve search strategies for flat-sets � Evaluate the memory allocator with real applications � How to make lock-free composite objects from “smaller” lock-free objects 2005 17 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Questions? � Contact Information: � Address: Anders Gidenstam, Computer Science & Engineering, Chalmers University of Technology, SE-412 96 Göteborg, Sweden � Email: andersg @ cs.chalmers.se � Web: http://www.cs.chalmers.se/~dcs http://www.cs.chalmers.se/~andersg � Implementation http://www.cs.chalmers.se/ ~dcs/nbmalloc.html 2005 18 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Concurrent applications #Threads Multi-threaded applications on new multicore CPU(s) Traditional multi- threaded desktop applications High performance multi- threaded applications on multiprocessors Traditional desktop applications #CPUs 1 5 2005 19 Anders Gidenstam, Distributed Computing and Systems, Chalmers

Allocating memory in a lock-free manner Anders Gidenstam, Marina - PowerPoint PPT Presentation

Allocating memory in a lock-free manner Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers University of Technology Outline

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Concurrency Problems Thierry Sans (recap) Lock A lock is an object in memory providing two atomic

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the

Dynamic Memory Allocation Lecture 27 COP 3014 Spring 2017 March 23, 2017 Allocating memory

Last Class: Memory Management Allocating memory to processes Limited physical memory,

Dynamic Memory Allocation Lecture 14 COP 3014 Fall 2019 November 20, 2019 Allocating memory

Dynamic Memory Management Allocating memory: The Interface Buddy System

Dynamic Memory Allocation Lecture 14 COP 3014 Spring 2018 April 4, 2018 Allocating memory

Easy Lock-Free Programming in Non-Volatile Memory Tia ianzheng Wang Justin Levandoski

Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis [POPL'19]

Efficient and Reliable Lock-Free Memory Introduction The Problem Reclamation

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

Programming in C 1 Pointer Variable A variable that stores a memory address Allows C

Kevin Hammond University of St Andrews, Scotland Hugo Simoes, Steffen Jost, Pedro

A Dynamic Memory Allocation Library for High-Level Synthesis Nicholas V. Giamblanco and Jason H.

Dynamic Memory Overview Dynamically allocated memory is stored in the Heap-section of

Memory Questions? What is main memory? CSCI [4|6]730 How does multiple processes share

Theory and Implementation of Dynamic Data Structures for the GPU John Owens Martn

Dynamic Data Structures for the GPU John Owens Child Family Professor of Engineering &

Allocating memory in a lock-free manner Anders Gidenstam, Marina - PowerPoint PPT Presentation

Allocating memory in a lock-free manner Anders Gidenstam, Marina Papatriantafilou and Philippas Tsigas Distributed Computing and Systems group, Department of Computer Science and Engineering, Chalmers University of Technology Outline

Lock-Free, Wait-Free and Multi-core Programming Roger Deran boilerbay.com Fast, Efficient

1 Reader/Writer Lock: Second Try Reader/Writer Lock: Second Try Guidelines for Condition

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Concurrency Problems Thierry Sans (recap) Lock A lock is an object in memory providing two atomic

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

LOCK/WAIT FREE SYNCHRONIZATION Synchronization Mutex Blocking Lock-free At

From Lock-Free to Wait-Free: Linked List Edward Duong Outline 1) Outline operations of the

Dynamic Memory Allocation Lecture 27 COP 3014 Spring 2017 March 23, 2017 Allocating memory

Last Class: Memory Management Allocating memory to processes Limited physical memory,

Dynamic Memory Allocation Lecture 14 COP 3014 Fall 2019 November 20, 2019 Allocating memory

Dynamic Memory Management Allocating memory: The Interface Buddy System

Dynamic Memory Allocation Lecture 14 COP 3014 Spring 2018 April 4, 2018 Allocating memory

Easy Lock-Free Programming in Non-Volatile Memory Tia ianzheng Wang Justin Levandoski

Decoupling Lock-Free Data Structures from Memory Reclamation for Static Analysis [POPL'19]

Efficient and Reliable Lock-Free Memory Introduction The Problem Reclamation

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

More Self-study Operators Unary operators, sizeof, boolean operators, comma, and operators

Programming in C 1 Pointer Variable A variable that stores a memory address Allows C

Kevin Hammond University of St Andrews, Scotland Hugo Simoes, Steffen Jost, Pedro

A Dynamic Memory Allocation Library for High-Level Synthesis Nicholas V. Giamblanco and Jason H.

Dynamic Memory Overview Dynamically allocated memory is stored in the Heap-section of

Memory Questions? What is main memory? CSCI [4|6]730 How does multiple processes share

Theory and Implementation of Dynamic Data Structures for the GPU John Owens Martn

Dynamic Data Structures for the GPU John Owens Child Family Professor of Engineering &amp;

Dynamic Data Structures for the GPU John Owens Child Family Professor of Engineering &