Moving thread activation policies to userspace using kfutex Helge - - PowerPoint PPT Presentation

moving thread activation policies to userspace using
SMART_READER_LITE
LIVE PREVIEW

Moving thread activation policies to userspace using kfutex Helge - - PowerPoint PPT Presentation

Moving thread activation policies to userspace using kfutex Helge Bahmann <hcb@chaoticmind.net> Google Zrich Pop quiz: Which class of operations do processes spend >99% of their time in? Introduction What are threads? Answer


slide-1
SLIDE 1

Moving thread activation policies to userspace using kfutex

Helge Bahmann <hcb@chaoticmind.net> Google Zürich

slide-2
SLIDE 2

Pop quiz: Which class of operations do processes spend >99% of their time in?

slide-3
SLIDE 3

Introduction

What are threads?

  • Answer I: A "parallelism abstraction"

A piece of a program running sequentially with respect to itself, and running with unspecified parallelism to the remainder of the

  • program. A means to expressing "conceptual parallelism".
  • Answer II: An "operating system concept"

A virtualized instance of a CPU, mapped dynamically to physical CPUs. A means to achieving "factual parallelism".

slide-4
SLIDE 4

Introduction

Linux event waiting "primitives" (not exhaustive)

  • select/poll/epoll_wait/epoll_pwait/...
  • sigsuspend/sigtimedwait/sigwaitinfo
  • waitpid
  • sleep/usleep/nanosleep
  • ioctl(..., DRM_IOCTL_WAIT_VBLANK, ...)
  • pthread_mutex_lock/pthread_cond_wait
  • ...

Observation: Combined event notification/delivery

slide-5
SLIDE 5

epoll-based notification

setup steady state processing kernel space

slide-6
SLIDE 6

epoll-based notification

setup steady state processing kernel space

slide-7
SLIDE 7

Common event handling patterns

  • "Edge client"

– Many kinds of event sources (peripherals, user interaction,

network, ...)

– ~1 instance each – almost no "intended" parallelism

  • Service

– Single dominant kind of event source (usually network) – many instances each – maximize throughput through parallelism

  • Reality usually somewhere between these extremes
slide-8
SLIDE 8

Leader/followers (classical)

  • Design constraints:

– Single (logical) event

source

– Handling any event may

take arbitrary (varying) amount of time Goal: Maximise throughput through parallelism

for (;;) { std::unique_lock<std::mutex> lock(m); Event ev = get_event_from_queue(); m.unlock(); handle_event(ev); }

  • Solution:

– "Leader" dequeues event – Promotes new "leader"

from pool of followers

– Handles event – Joins pool of followers

Simplest possible implementation relies on thread activation policy by "mutex" to select new leader.

slide-9
SLIDE 9

Leader/followers (classical)

irq thread 1 device driver epoll queue kernel space user space thread 2

slide-10
SLIDE 10

Leader/followers (classical)

  • Literature: more fancy^Wsophisticated leader

selection This does not change two fundamental facts:

– The promoted follower will be temporarily woken, just to put

itself back to sleep again

– The last active thread cannot become leader again without

another pointless wake up of the current leader to displace it

  • Due to thread/CPU affinity one IPI per operation
  • Particularly pathological for #threads = #CPUs
slide-11
SLIDE 11

futex

Linux system call for suspending/waking up threads based on an address

  • futex(addr, FUTEX_WAIT, value)

Atomically verifies that *addr == value and puts calling thread to sleep in "waiting at addr" state. Returns 0 if thread was put to sleep (and woken later).

  • futex(addr, FUTEX_WAKE, count)

Wakes up at most count threads in "waiting at addr" state

  • futex(addr, FUTEX_REQUEUE, new_addr)

Changes all threads currently in "waiting at addr" state into "waiting at new_addr" state

slide-12
SLIDE 12

class mutex { public: void lock(); void unlock(); private: enum state_type { unlocked = 0, locked = 1, locked_contention = 2 }; std::atomic<state_type> state_; ... }; void mutex::lock() { state_type current = state_.load(); for (;;) { switch (current) { case unlocked: { if (state_.compare_exchange_weak( current, locked)) {return;} break; } case locked: { if (!state_.compare_exchange_weak( current, locked_contetion) { break; } // fallthrough } case locked_contention { if (!futex(&state, FUTEX_WAIT, locked_contention)) {return;} state_type current = state_.load(); break; } } } } void mutex::unlock() { if (state_.exchange(unlocked) == locked_contention) { futex(&state, FUTEX_WAKE, 1); } }

futex

Implementing a mutex

slide-13
SLIDE 13

futex

FUTEX_REQUEUE comes into play to avoid a "thundering herd" problem with condition variables

template<typename X> class synchronized_queue { public: template<typename Iter> void enqueue_many(Iter begin, Iter end) { std::unique_lock<std::mutex> lock(m_); queue.insert(queue.end(), begin, end); c.notify_all(); lock.unlock(); } X dequeue() { std::unique_lock<std::mutex> lock(m_); while (queue.empty()) { c.wait(lock);} X result = std::move(queue.front();) queue.pop_front(); lock.unlock(); return result; } private: std::mutex m_; std::condition_variable c_; std::list<X> queue_; };

"Naive" wake up will cause all threads to race acquiring the mutex, blocking all but one again at just this point. "Requeue" allows to change the woken threads from "waiting at condition variable" state to "waiting at mutex" state and thus avoids the thundering herd.

slide-14
SLIDE 14

kfutex

  • Extension to allow futex signalling from kernel space

– User space defines...

  • Address of an atomic variable (doubles as futex location)
  • Mutation protocol: Single parameterized atomic operation
  • Wake up criterion: Single parameterized test of pre/post value

– Kernel acts on these directives when signalling a kfutex

  • Extension to bind kfutex signalling to kernel events

– e.g. I/O readiness

  • Peripherally related: Extension for event ringbuffer
slide-15
SLIDE 15

kfutex-based notification

setup steady state processing user space kernel space

slide-16
SLIDE 16

kfutex-based notification

setup steady state processing user space kernel space

slide-17
SLIDE 17

leader/followers (futex)

  • Bind event source to kfutex

– "Leader" FUTEX_WAITs on this event futex – "Followers" FUTEX_WAIT on a private signalling futex each

  • When leader receives an event

– it FUTEX_REQUEUEs one of the followers to the event

futex

– begins handling event

  • When thread finishes handling an event

– either: waits on its private signalling futex – or: FUTEX_REQUEUEs current leader to its private

signalling futex ("demotes") and becomes leader itself

  • Leader selection policy in user space
slide-18
SLIDE 18

Summary

  • kfutex unifies inter-thread and kernel notification
  • kfutex separates event notification/delivery

– delivery suitably possible through e.g. lock-free ring buffers

  • allows moving activation policy decisions to user

space; avoids "useless" task wake ups

  • efficiency gain by avoiding kernel entry in fast paths
  • kernel implementation complexity to avoid "abuse" of

kfutex side effects

– futex key hash collisions, page pinning

  • synchronization implementation complexity

– lock-free kernel/user-space synchronization protocol