Moving thread activation policies to userspace using kfutex Helge - PowerPoint PPT Presentation

Moving thread activation policies to userspace using kfutex Helge Bahmann <hcb@chaoticmind.net> Google Zürich

Pop quiz: Which class of operations do processes spend >99% of their time in?

Introduction What are threads? ● Answer I: A "parallelism abstraction" A piece of a program running sequentially with respect to itself, and running with unspecified parallelism to the remainder of the program. A means to expressing "conceptual parallelism". ● Answer II: An "operating system concept" A virtualized instance of a CPU, mapped dynamically to physical CPUs. A means to achieving "factual parallelism".

Introduction Linux event waiting "primitives" (not exhaustive) ● select/poll/epoll_wait/epoll_pwait/... ● sigsuspend/sigtimedwait/sigwaitinfo ● waitpid ● sleep/usleep/nanosleep ● ioctl(..., DRM_IOCTL_WAIT_VBLANK, ...) ● pthread_mutex_lock/pthread_cond_wait ● ... Observation: Combined event notification/delivery

epoll-based notification kernel space setup steady state processing

Common event handling patterns ● "Edge client" – Many kinds of event sources (peripherals, user interaction, network, ...) – ~1 instance each – almost no "intended" parallelism ● Service – Single dominant kind of event source (usually network) – many instances each – maximize throughput through parallelism ● Reality usually somewhere between these extremes

Leader/followers (classical) ● Design constraints: ● Solution: – Single (logical) event – "Leader" dequeues event source – Promotes new "leader" – Handling any event may from pool of followers take arbitrary (varying) – Handles event amount of time – Joins pool of followers Goal: Maximise throughput through parallelism Simplest possible implementation for (;;) { relies on thread activation policy by std::unique_lock<std::mutex> "mutex" to select new leader. lock(m); Event ev = get_event_from_queue(); m.unlock(); handle_event(ev); }

Leader/followers (classical) user space kernel space epoll device queue driver thread 1 thread 2 irq

Leader/followers (classical) ● Literature: more fancy^Wsophisticated leader selection This does not change two fundamental facts: – The promoted follower will be temporarily woken, just to put itself back to sleep again – The last active thread cannot become leader again without another pointless wake up of the current leader to displace it ● Due to thread/CPU affinity one IPI per operation ● Particularly pathological for #threads = #CPUs

futex Linux system call for suspending/waking up threads based on an address ● futex(addr, FUTEX_WAIT, value) Atomically verifies that *addr == value and puts calling thread to sleep in "waiting at addr " state. Returns 0 if thread was put to sleep (and woken later). ● futex(addr, FUTEX_WAKE, count) Wakes up at most count threads in "waiting at addr " state ● futex(addr, FUTEX_REQUEUE, new_addr) Changes all threads currently in "waiting at addr " state into "waiting at new_addr " state

futex Implementing a mutex class mutex { void mutex::lock() { public : state_type current = state_.load(); for (;;) { void lock(); switch (current) { void unlock(); case unlocked: { private : if (state_.compare_exchange_weak( enum state_type { current, locked)) { return ;} unlocked = 0, break ; locked = 1, } locked_contention = 2 case locked: { }; if (!state_.compare_exchange_weak( std::atomic<state_type> current, locked_contetion) { state_; break ; ... } // fallthrough }; } case locked_contention { void mutex::unlock() { if (!futex(&state, FUTEX_WAIT, if (state_.exchange(unlocked) locked_contention)) { return ;} == locked_contention) { state_type current = state_.load(); futex(&state, FUTEX_WAKE, break ; 1); } } } } } }

futex FUTEX_REQUEUE comes into play to avoid a "thundering herd" problem with condition variables "Naive" wake up will template < typename X> class synchronized_queue { cause all threads to race public : template < typename Iter> acquiring the mutex, void enqueue_many(Iter begin, Iter end) { blocking all but one again std::unique_lock<std::mutex> lock(m_); at just this point. queue.insert(queue.end(), begin, end); c.notify_all(); "Requeue" allows to lock.unlock(); change the woken threads } from "waiting at condition X dequeue() { variable" state to "waiting std::unique_lock<std::mutex> lock(m_); at mutex" state and thus while (queue.empty()) { c.wait(lock);} avoids the thundering X result = std::move(queue.front();) herd. queue.pop_front(); lock.unlock(); return result; } private : std::mutex m_; std::condition_variable c_; std::list<X> queue_; };

kfutex ● Extension to allow futex signalling from kernel space – User space defines... ● Address of an atomic variable (doubles as futex location) ● Mutation protocol: Single parameterized atomic operation ● Wake up criterion: Single parameterized test of pre/post value – Kernel acts on these directives when signalling a kfutex ● Extension to bind kfutex signalling to kernel events – e.g. I/O readiness ● Peripherally related: Extension for event ringbuffer

kfutex-based notification user space kernel space setup steady state processing

leader/followers (futex) ● Bind event source to kfutex – "Leader" FUTEX_WAITs on this event futex – "Followers" FUTEX_WAIT on a private signalling futex each ● When leader receives an event – it FUTEX_REQUEUEs one of the followers to the event futex – begins handling event ● When thread finishes handling an event – either: waits on its private signalling futex – or : FUTEX_REQUEUEs current leader to its private signalling futex ("demotes") and becomes leader itself ● Leader selection policy in user space

Summary ● kfutex unifies inter-thread and kernel notification ● kfutex separates event notification/delivery – delivery suitably possible through e.g. lock-free ring buffers ● allows moving activation policy decisions to user space; avoids "useless" task wake ups ● efficiency gain by avoiding kernel entry in fast paths ● kernel implementation complexity to avoid "abuse" of kfutex side effects – futex key hash collisions, page pinning ● synchronization implementation complexity – lock-free kernel/user-space synchronization protocol

Moving thread activation policies to userspace using kfutex Helge - PowerPoint PPT Presentation

Moving thread activation policies to userspace using kfutex Helge Bahmann <hcb@chaoticmind.net> Google Zrich Pop quiz: Which class of operations do processes spend >99% of their time in? Introduction What are threads? Answer

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

DDR solution Sprites overview Moving right arrow Moving left arrow Moving down arrow Moving up

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib,

Linux Plumbers Conference 2011 Userspace RCU Library: RCU Synchronization and RCU/Lock-Free Data

The userspace solution for control groups Linux Kongress 2010 Dhaval Giani

2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 Berkeley TCP Arrakis &

AddressSanitizer/ThreadSanitizer for Linux Kernel and userspace. Konstantin Serebryany, Dmitry

XtreemFS: high- performance network file system clients and servers in userspace Minor Gordon,

Design of Thread-Safe Classes 1 Topic Outline Thread-Safe Classes Principles Confinement

Synthesizing Commutativity Conditions Kshitij Bansal Eric Koskinen Omer Tripp New York

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

CPL 2016, week 5 Inter-thread collaboration Oleg Batrashev Institute of Computer Science, Tartu,

Incorporating a centralized function to form a holistic approach to vendor & third party risk

A Geopolitical Review of Definitions, Dimensions and Indicators of Energy Security J. A.

Annual Meeting of Shareholders May 15, 2013 H E C L A M I N I N G C O M P A N Y Cautionary

NCWM National Conference on Weights and Measures 2017 NCWM Interim Meeting: San Antonio, TX

Wavelets for Efficient Querying of Large Wavelets for Efficient Querying of Large

Building Tomorrow s Workforce The Exchange Abu Dhabi, United Arab Emirates LAMIA MOUBAYED

Solar Irradiance Variability Observations during Solar Cycles 21 to 24 Tom Woods LASP /

Singapore EEE07 Pang Kai Lin River Valley High School Rationale Decreasing LCOE High and

Moving thread activation policies to userspace using kfutex Helge - PowerPoint PPT Presentation

Moving thread activation policies to userspace using kfutex Helge Bahmann <hcb@chaoticmind.net> Google Zrich Pop quiz: Which class of operations do processes spend >99% of their time in? Introduction What are threads? Answer

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

DDR solution Sprites overview Moving right arrow Moving left arrow Moving down arrow Moving up

To thread or not to thread? Why PETSc favors MPI-only Plenary Discussion PETSc User Meeting 2016

Is This Class Thread-Safe? Inferring Documentation using Graph-Based Learning Andrew Habib,

Linux Plumbers Conference 2011 Userspace RCU Library: RCU Synchronization and RCU/Lock-Free Data

The userspace solution for control groups Linux Kongress 2010 Dhaval Giani

2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 Berkeley TCP Arrakis &amp;

AddressSanitizer/ThreadSanitizer for Linux Kernel and userspace. Konstantin Serebryany, Dmitry

XtreemFS: high- performance network file system clients and servers in userspace Minor Gordon,

Design of Thread-Safe Classes 1 Topic Outline Thread-Safe Classes Principles Confinement

Synthesizing Commutativity Conditions Kshitij Bansal Eric Koskinen Omer Tripp New York

Roadmap for Section 4.3. Windows Process and Thread Internals Thread Block, Process Block Flow

Directive-Based Programming with OpenMP Shared Memory Programming Explicit thread creation

CPL 2016, week 3 Thread management: execution and shutdown Oleg Batrashev Institute of Computer

CPL 2016, week 5 Inter-thread collaboration Oleg Batrashev Institute of Computer Science, Tartu,

Incorporating a centralized function to form a holistic approach to vendor &amp; third party risk

A Geopolitical Review of Definitions, Dimensions and Indicators of Energy Security J. A.

Annual Meeting of Shareholders May 15, 2013 H E C L A M I N I N G C O M P A N Y Cautionary

NCWM National Conference on Weights and Measures 2017 NCWM Interim Meeting: San Antonio, TX

Wavelets for Efficient Querying of Large Wavelets for Efficient Querying of Large

Building Tomorrow s Workforce The Exchange Abu Dhabi, United Arab Emirates LAMIA MOUBAYED

Solar Irradiance Variability Observations during Solar Cycles 21 to 24 Tom Woods LASP /

Singapore EEE07 Pang Kai Lin River Valley High School Rationale Decreasing LCOE High and

2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 Berkeley TCP Arrakis &

Incorporating a centralized function to form a holistic approach to vendor & third party risk