Moving thread activation policies to userspace using kfutex Helge - - PowerPoint PPT Presentation
Moving thread activation policies to userspace using kfutex Helge - - PowerPoint PPT Presentation
Moving thread activation policies to userspace using kfutex Helge Bahmann <hcb@chaoticmind.net> Google Zrich Pop quiz: Which class of operations do processes spend >99% of their time in? Introduction What are threads? Answer
Pop quiz: Which class of operations do processes spend >99% of their time in?
Introduction
What are threads?
- Answer I: A "parallelism abstraction"
A piece of a program running sequentially with respect to itself, and running with unspecified parallelism to the remainder of the
- program. A means to expressing "conceptual parallelism".
- Answer II: An "operating system concept"
A virtualized instance of a CPU, mapped dynamically to physical CPUs. A means to achieving "factual parallelism".
Introduction
Linux event waiting "primitives" (not exhaustive)
- select/poll/epoll_wait/epoll_pwait/...
- sigsuspend/sigtimedwait/sigwaitinfo
- waitpid
- sleep/usleep/nanosleep
- ioctl(..., DRM_IOCTL_WAIT_VBLANK, ...)
- pthread_mutex_lock/pthread_cond_wait
- ...
Observation: Combined event notification/delivery
epoll-based notification
setup steady state processing kernel space
epoll-based notification
setup steady state processing kernel space
Common event handling patterns
- "Edge client"
– Many kinds of event sources (peripherals, user interaction,
network, ...)
– ~1 instance each – almost no "intended" parallelism
- Service
– Single dominant kind of event source (usually network) – many instances each – maximize throughput through parallelism
- Reality usually somewhere between these extremes
Leader/followers (classical)
- Design constraints:
– Single (logical) event
source
– Handling any event may
take arbitrary (varying) amount of time Goal: Maximise throughput through parallelism
for (;;) { std::unique_lock<std::mutex> lock(m); Event ev = get_event_from_queue(); m.unlock(); handle_event(ev); }
- Solution:
– "Leader" dequeues event – Promotes new "leader"
from pool of followers
– Handles event – Joins pool of followers
Simplest possible implementation relies on thread activation policy by "mutex" to select new leader.
Leader/followers (classical)
irq thread 1 device driver epoll queue kernel space user space thread 2
Leader/followers (classical)
- Literature: more fancy^Wsophisticated leader
selection This does not change two fundamental facts:
– The promoted follower will be temporarily woken, just to put
itself back to sleep again
– The last active thread cannot become leader again without
another pointless wake up of the current leader to displace it
- Due to thread/CPU affinity one IPI per operation
- Particularly pathological for #threads = #CPUs
futex
Linux system call for suspending/waking up threads based on an address
- futex(addr, FUTEX_WAIT, value)
Atomically verifies that *addr == value and puts calling thread to sleep in "waiting at addr" state. Returns 0 if thread was put to sleep (and woken later).
- futex(addr, FUTEX_WAKE, count)
Wakes up at most count threads in "waiting at addr" state
- futex(addr, FUTEX_REQUEUE, new_addr)
Changes all threads currently in "waiting at addr" state into "waiting at new_addr" state
class mutex { public: void lock(); void unlock(); private: enum state_type { unlocked = 0, locked = 1, locked_contention = 2 }; std::atomic<state_type> state_; ... }; void mutex::lock() { state_type current = state_.load(); for (;;) { switch (current) { case unlocked: { if (state_.compare_exchange_weak( current, locked)) {return;} break; } case locked: { if (!state_.compare_exchange_weak( current, locked_contetion) { break; } // fallthrough } case locked_contention { if (!futex(&state, FUTEX_WAIT, locked_contention)) {return;} state_type current = state_.load(); break; } } } } void mutex::unlock() { if (state_.exchange(unlocked) == locked_contention) { futex(&state, FUTEX_WAKE, 1); } }
futex
Implementing a mutex
futex
FUTEX_REQUEUE comes into play to avoid a "thundering herd" problem with condition variables
template<typename X> class synchronized_queue { public: template<typename Iter> void enqueue_many(Iter begin, Iter end) { std::unique_lock<std::mutex> lock(m_); queue.insert(queue.end(), begin, end); c.notify_all(); lock.unlock(); } X dequeue() { std::unique_lock<std::mutex> lock(m_); while (queue.empty()) { c.wait(lock);} X result = std::move(queue.front();) queue.pop_front(); lock.unlock(); return result; } private: std::mutex m_; std::condition_variable c_; std::list<X> queue_; };
"Naive" wake up will cause all threads to race acquiring the mutex, blocking all but one again at just this point. "Requeue" allows to change the woken threads from "waiting at condition variable" state to "waiting at mutex" state and thus avoids the thundering herd.
kfutex
- Extension to allow futex signalling from kernel space
– User space defines...
- Address of an atomic variable (doubles as futex location)
- Mutation protocol: Single parameterized atomic operation
- Wake up criterion: Single parameterized test of pre/post value
– Kernel acts on these directives when signalling a kfutex
- Extension to bind kfutex signalling to kernel events
– e.g. I/O readiness
- Peripherally related: Extension for event ringbuffer
kfutex-based notification
setup steady state processing user space kernel space
kfutex-based notification
setup steady state processing user space kernel space
leader/followers (futex)
- Bind event source to kfutex
– "Leader" FUTEX_WAITs on this event futex – "Followers" FUTEX_WAIT on a private signalling futex each
- When leader receives an event
– it FUTEX_REQUEUEs one of the followers to the event
futex
– begins handling event
- When thread finishes handling an event
– either: waits on its private signalling futex – or: FUTEX_REQUEUEs current leader to its private
signalling futex ("demotes") and becomes leader itself
- Leader selection policy in user space
Summary
- kfutex unifies inter-thread and kernel notification
- kfutex separates event notification/delivery
– delivery suitably possible through e.g. lock-free ring buffers
- allows moving activation policy decisions to user
space; avoids "useless" task wake ups
- efficiency gain by avoiding kernel entry in fast paths
- kernel implementation complexity to avoid "abuse" of
kfutex side effects
– futex key hash collisions, page pinning
- synchronization implementation complexity
– lock-free kernel/user-space synchronization protocol